TOP
Research
Research Teams
Large-scale Parallel Numerical Computing Technology Research Team
Large-scale Parallel Numerical Computing Technology Research Team
Japanese
Team Leader Toshiyuki Imamura
imamura.toshiyuki[at]riken.jp (Lab location: Kobe)
- Please change [at] to @
- 2012
- Team leader, Large-scale Parallel Numerical Computing Technology Research Team, AICS (renamed R-CCS in 2018), RIKEN (-present)
- 2001
- Guest Scientist, High Performance Computing Center Stuttgart
- 2003
- Assistant Professor, University of Electro-Communications
- 1996
- Researcher, Japan Atomic Energy Research Institute
- 1996
- Graduated from Applied Systems and Science, Graduate School, Division of Engineering, Kyoto University
Keyword
- Parallel Algorithms
Research summary
The Large-scale Parallel Numerical Computing Technology Research Team conducts research and development of a large scale, highly parallel and high-performance numerical software library for the K computer. Simulation programs require various numerical algorithms for the solution of linear systems, eigenvalue problems, fast Fourier transforms, and non-linear equations. In order to take advantage of the full potential of the K computer, we must select algorithms and develop a numerical software library based on the concepts of high parallelism, high performance, high precision, resiliency, and scalability. We achieve this through close collaboration among computational science (simulation), computer science (hardware and software) and numerical mathematics. Our goal is to establish a fundamental technique to develop numerical software libraries, called KMATHLIB, for next generation supercomputer systems based on strong cooperation within R-CCS.
Main research results
World Largest Dense Eigenvalue Computation
The solution of real symmetric dense eigenvalue problems is one of the fundamental matrix computations. To date, several new high-performance eigensolvers have been developed for peta and postpeta scale systems. One of these, the EigenExa eigensolver, has been developed in Japan. EigenExa provides two routines: eigen_s, which is based on traditional tridiagonalization, and eigen_sx, which employs a new method via a pentadiagonal matrix. Recently, we conducted a detailed performance evaluation of EigenExa by using 4,800 nodes of the Oakleaf-FX supercomputer system. In this paper, we report the results of our evaluation, which is mainly focused on investigating the differences between the two routines.
The results clearly indicate both the advantages and disadvantages of eigen_sx over eigen_s, which will contribute to further performance improvement of EigenExa. The obtained results are also expected to be useful for other parallel dense matrix computations, in addition to eigenvalue problems. We have successfully solved a world largest-scale dense eigenvalue problem (one million dimension) by EigenExa taking advantage of the overall nodes (82,944 processors) of K computer in 3,464 seconds. Our EigenExa achieves 1.7 PFLOPS (16% of the K computer’s peak performance). It is the world highest performance for solving an eigenvalue problem of a dense matrix.
Representative papers
- Toshiyuki Imamura, Takeshi Fukaya, Yusuke Hirota, Susumu Yamada and Masahiko Machida.:
"CAHTR: Communication-Avoiding Householder TRidiagonalization"
Proc. ParCo2015, Advances in Parallel Computing, Vol. 27: Parallel Computing: On the Road to Exascale, pp. 381-390, 2016.
- Yusuke,Hirota.,and Toshiyuki Imamura.:
"Divide-and-Conquer Method for Banded Generalized Eigenvalue Problems"
Journal of Information Processing Computing System, Vol.52,Nov,20,2015.
- Kawamura,Takuma.,Idomura,Yasuhiro.,Miyamura,Hiroko.,Imamura,Toshiyuki.,and Takemiya,Hiroshi.:
"Visualization technique for large-scale data by particle-based volume rendering"
Transactions of ISCIE,Vol.28, No.5,pp.221-227,May,15,2015.
- Seikichi,Matsuoka.,Shinsuke,Satake.,Yasuhiro,Idomura.,and Toshiyuki,Imamura.:
"Quality and Performance of a Pseudo-Random Number Generator in Massively Parallel Plasma Particle Simulations"
Proceedings of ANS MC2015 - Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method.
- Takeshi,Fukaya., and Toshiyuki,Imamura.:
"Performance evaluation of the EigenExa eigensolver on Oakleaf-FX: tridiagonalization versus pentadiagonalization"
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp. 960-969, May 25 2015.
- T,Imamura.:
"The EigenExa Library - High Performance & Scalable Direct Eigensolver for Large-Scale Computational Science"
International Supercomputing Conference (ISC14), Leipzig, June (2014). (invited talk)
- D,Mukunoki., T,Imamura., and D,Takahashi.:
"Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs"
23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), March 2015 (2015).
- T,Fukaya., Y,Nakatsukasa., Y,Yanagisawa., and Y,Yamamoto.: CholeskyQR2.:
"A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System"
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), (2014).
- T,Miyoshi., K,Kondo., and T,Imamura.:
"The 10,240-member ensemble Kalman filtering with an intermediate AGCM"
Geophysical Research Letters, Vol.41 (2014).
- Y,Idomura., M,Nakata., S,Yamada., M,Machida., T,Imamura., T,Watanabe., M,Nunami., H,Inoue., S,Tsutsumi., I,Miyoshi., and N,Shida.:
"Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer"
International Journal of High Performance Computing Applications, 28(1) 73-86 (2014), SAGE publications, doi: 10.1177/1094342013490973.