RIKEN Center for Computational Science Large-scale Parallel Numerical Computing Technology Research Team
TOP > Projects > 2.5D-PDGEMM

Semi-ScaLAPACK-Compatible 2.5D-PxGEMM based on SUMMA (SC-SUMMA-25D)


We are developing a new parallel matrix multiplication routine (so-called PDGEMM in PBLAS) that can achieve proper strong scaling on the post-K computer using the 2.5D algorithm with the help of communication avoidance. The 2.5D algorithm requires a 2.5D matrix distribution stacking a matrix with a 2D distribution over a 3D process grid. To support the compatibility with the conventional PDGEMM, which computes matrices distributed as a 2D distribution on a 2D process grid, our implementation was designed to perform a matrix redistribution between 2D and 2.5D distributions before and after the computation (2D-compatible 2.5D-PDGEMM). We have developed prototype implementations based on the Cannon’s algorithm and the SUMMA algorithm, furthermore, evaluated the performance using up to 16384 nodes of the K computer. The results showed that our implementations outperformed conventional 2D-PDGEMMs including the PBLAS PDGEMM even when the matrix redistribution cost between 2D and 2.5D distributions was included. For example, we observed that our implementation (with stack size c=4) achieved an approximately 3.3-fold speed increase in the case of 16,384 nodes (matrix size: n=32,768) when compared with the 2D implementation.




Copyright © 2018-2020 Large-scale Parallel Numerical Computing Technology Research Team, RIKEN Center for Computational Science, All rights reserved.
RIKEN | 理化学研究所
RIKEN Center for Computational Science (R-CCS) | 理化学研究所 計算科学研究センター