TOP Research Research Teams Large-scale Parallel Numerical Computing Technology Research Team
Large-scale Parallel Numerical Computing Technology Research Team
JapaneseTeam Leader Toshiyuki Imamura
- imamura.toshiyuki[at]riken.jp (Lab location: Kobe)
- Please change [at] to @
- 2012
- Team leader, Large-scale Parallel Numerical Computing Technology Research Team, AICS (renamed R-CCS in 2018), RIKEN (-present)
- 2001
- Guest Scientist, High Performance Computing Center Stuttgart
- 2003
- Assistant Professor, University of Electro-Communications
- 1996
- Researcher, Japan Atomic Energy Research Institute
- 1996
- Graduated from Applied Systems and Science, Graduate School, Division of Engineering, Kyoto University
Keyword
- Parallel Algorithms
- High Performance Computing
- Numerical Linear Algebra
- Mixed-precision Numerical Computing
- Minimal Precision Computing
Research summary
The Large-scale Parallel Numerical Computing Technology Research Team conducts research and development of a large scale, highly parallel and high-performance numerical software library for the supercomputer Fugaku. Simulation programs require various numerical algorithms for the solution of linear systems, eigenvalue problems, fast Fourier transforms, and non-linear equations. In order to take advantage of the full potential of Fugaku, we must select algorithms and develop a numerical software library based on the concepts of high parallelism, high performance, high precision, resiliency, and scalability. We achieve this through close collaboration among computational science (simulation), computer science (hardware and software) and numerical mathematics. Our goal is to establish a fundamental technique to develop numerical software libraries, called KMATHLIB, for next generation supercomputer systems based on strong cooperation within R-CCS.
Main research results
World Largest Dense Eigenvalue Computation
The solution of real symmetric dense eigenvalue problems is one of the fundamental matrix computations. To date, several new high-performance eigensolvers have been developed for peta and postpeta scale systems. One of these, the EigenExa eigensolver, has been developed in Japan. EigenExa provides two routines: eigen_s, which is based on traditional tridiagonalization, and eigen_sx, which employs a new method via a pentadiagonal matrix. Recently, we conducted a detailed performance evaluation of EigenExa by using 4,800 nodes of the Oakleaf-FX supercomputer system. In this paper, we report the results of our evaluation, which is mainly focused on investigating the differences between the two routines.
The results clearly indicate both the advantages and disadvantages of eigen_sx over eigen_s, which will contribute to further performance improvement of EigenExa. The obtained results are also expected to be useful for other parallel dense matrix computations, in addition to eigenvalue problems. We have successfully solved a world largest-scale dense eigenvalue problem (one million dimension) by EigenExa taking advantage of the overall nodes (82,944 processors) of K computer in 3,464 seconds. Our EigenExa achieves 1.7 PFLOPS (16% of the K computer’s peak performance). It is the world highest performance for solving an eigenvalue problem of a dense matrix.
Representative papers
- Shuhei Kudo, Yusaku Yamamoto, Toshiyuki Imamura,
“Error Analysis of the Cholesky QR-Based Block Orthogonalization Process for the One-Sided Block Jacobi SVD Algorithm,”
COMPUTING AND INFORMATICS, Vol. 39(6), 1203–1228, 2021. https://doi.org/10.31577/cai_2020_6_1203 - Hisashi Yashiro, Koji Terasaki, Yuta Kawai, Shuhei Kudo, Takemasa Miyoshi, Toshiyuki Imamura, Kazuo Minami, Hikaru Inoue, Tatsuo Nishiki, Takayuki Saji, Masaki Satoh, and Hirofumi Tomita. 2020. A 1024-member ensemble data assimilation with 3.5-km mesh global weather simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). IEEE Press, Article 1, 1–10.
- "Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura:
DGEMM using Tensor Cores, and Its Accurate and Reproducible Versions, ISC High Performance 2020, Lecture Notes in Computer Science, Vol. 12151, pp. 230-248, Jun. 2020.
https://link.springer.com/chapter/10.1007/978-3-030-50743-5_12" - "Toshiyuki Imamura, Takeshi Fukaya, Yusuke Hirota, Susumu Yamada and Masahiko Machida.:
"CAHTR: Communication-Avoiding Householder TRidiagonalization"
Proc. ParCo2015, Advances in Parallel Computing, Vol. 27: Parallel Computing: On the Road to Exascale, pp. 381-390, 2016." - Yusuke,Hirota.,and Toshiyuki Imamura.:
"Divide-and-Conquer Method for Banded Generalized Eigenvalue Problems"
Journal of Information Processing Computing System, Vol.52,Nov,20,2015. - Takeshi,Fukaya., and Toshiyuki,Imamura.:
"Performance evaluation of the EigenExa eigensolver on Oakleaf-FX: tridiagonalization versus pentadiagonalization"
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp. 960-969, May 25 2015. - T,Imamura.:
"The EigenExa Library - High Performance & Scalable Direct Eigensolver for Large-Scale Computational Science"
International Supercomputing Conference (ISC14), Leipzig, June (2014). (invited talk) - Y,Idomura., M,Nakata., S,Yamada., M,Machida., T,Imamura., T,Watanabe., M,Nunami., H,Inoue., S,Tsutsumi., I,Miyoshi., and N,Shida.:
"Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer"
International Journal of High Performance Computing Applications, 28(1) 73-86 (2014), SAGE publications, doi: 10.1177/1094342013490973.