R-CCS Cafe
R-CCS Cafe 第199回 第2部
Recent topics of High Precision and Low Precision Computing in HPC
開催日 | 2020年10月5日(月) |
開催時間 | 16:40 - 17:00(17:20 - 17:40 講演者を交えたフリーディスカッション(冒頭に1-2分の小休止を挟みます)) |
開催都市 | オンライン |
場所 | BlueJeansによる遠隔セミナー |
使用言語 | 発表・スライド共に英語 |
登壇者 |
今村 俊幸 大規模並列数値計算技術研究チーム、チームリーダー ![]() |
For the HPC community, common issues on computational speed and computational accuracy are generally considered to be conflicting. However, the diversity and enhancement of hardware and the high productivity of software have allowed users to choose the precision within the requirement of appropriate computational accuracy. These may provide us with enormous changes in scientific and technical computing, whereas it has been dominated by double-precision calculation for a long time. In the seminar, I will introduce the recent topics such as the relationship between high performance and precision and the relationship between modern hardware and computation accuracy, mainly focusing on the numerical libraries developed by my team in the above topics; i) establishment of higher precision software by massively-and-high-performance low-precision computing units, ii) algorithmic advancement of lower-precision units in scientific computing like HPL-AI benchmark, iii) idea of minimal-precision computing. The first is the realization of a DGEMM-equivalent matrix product using TensorCore(TC) by Mukunoki et al. This is an important fact. It is one of the academic case studies of the utilization of TC's. On the other hand, it suggests the possibility of controlling the number of double precision units by installing a sufficient number of low precision arithmetic units. The second refers to our HPL-AI result, of course, one of the world's four crowning benchmarks and its computation is based on a mixed precision of FP16, FP32, and FP64 formats. The essential point of HPL-AI is to bring out the high performance of low-precision arithmetic while preventing numerical instability and inaccuracy in low-precision arithmetic. It is not simply a matter of rewriting double to half. This is accomplished by a preliminary analysis of the computation target and patterns. The third is to promote the minimum system of computation, which is anticipated to change storage capacity, energy consumption, and minimum hardware requirements of the current floating-point unit. Users won't feel a big impact in terms of input/output, but the internal design of computers will be significantly enhanced.
