Matrix multiplication requires computer systems have huge computing capability and data throughputs as problem size is increased. In this research, an OpenCL-based matrix multiplier with task parallelism is designed and implemented by using the FPGA board DE5a-NET to improve computation throughput and energy efficiency. The matrix multiplier is based on the systolic array architecture with 10 × 16 processing elements (PEs). When data are single-precision floating-point, the proposed matrix multiplier averagely achieves about 785 GFLOPs in computation throughput and 66.75 GFLOPs/W in energy efficiency. Compared with the Intel’s OpenCL example with data parallelism on FPGA, the SGEMM routines in the Intel MKL and OpenBLAS libraries executed on a desktop with 32 GB DDR4 RAMs and an Intel i7-6800K processor running at 3.4 GHz, the proposed matrix multiplier averagely outperforms by 3.2 times, 1.3 times, and 1.6 times in omputation throughput, and by 2.9 times, 10.5 times, and 11.8 times in energy efficiency, respectively, even though the fabrication technology is 20 nm in the FPGA while it is 14 nm in the CPU.
日時: 2020年1月27日（月）、15:30 - 16:10
場所: R-CCS 6階講堂
・講演題目：Design of An FPGA-based Matrix Multiplier