TOP    Events & Outreach    R-CCS Cafe    The 189th R-CCS Cafe - part 1

Title

Design of An FPGA-based Matrix Multiplier

Details
Date Mon, Jan 27, 2020
Time 3:30 pm - 4:10 pm
City Kobe, Japan
Place

Lecture Hall (6th floor) at R-CCS

Language Presentation Language: English
Presentation Material: English
Speakers

Tan Yiyu

Large-scale Parallel Numerical Computing Technology Research Team

photo:Tan Yiyu

Abstract

Matrix multiplication requires computer systems have huge computing capability and data throughputs as problem size is increased. In this research, an OpenCL-based matrix multiplier with task parallelism is designed and implemented by using the FPGA board DE5a-NET to improve computation throughput and energy efficiency. The matrix multiplier is based on the systolic array architecture with 10 × 16 processing elements (PEs). When data are single-precision floating-point, the proposed matrix multiplier averagely achieves about 785 GFLOPs in computation throughput and 66.75 GFLOPs/W in energy efficiency. Compared with the Intel’s OpenCL example with data parallelism on FPGA, the SGEMM routines in the Intel MKL and OpenBLAS libraries executed on a desktop with 32 GB DDR4 RAMs and an Intel i7-6800K processor running at 3.4 GHz, the proposed matrix multiplier averagely outperforms by 3.2 times, 1.3 times, and 1.6 times in omputation throughput, and by 2.9 times, 10.5 times, and 11.8 times in energy efficiency, respectively, even though the fabrication technology is 20 nm in the FPGA while it is 14 nm in the CPU.

(Jan 21, 2020)