TOP
Research
Research Teams
Processor Research Team
Processor Research Team
Japanese
Team Leader Kentaro Sano
kentaro.sano[at]riken.jp (Lab location: Kobe)
- Please change [at] to @
- 2024
- Unit Leader, Advanced AI Device Development Unit, AI for Science Platform Division, R-CCS, RIKEN (-present)
- 2017
- Team Leader, Processor Research Team, AICS (renamed R-CCS in 2018), RIKEN (-present)
- 2006
- Visiting Researcher, Imperial College,London
- 2005
- Associate Professor, Graduate School of Information Sciences, Tohoku University
- 2001
- Assistant Professor, Graduate School of Information Sciences, Tohoku University
- 2000
- Assistant Professor, Graduate School of Engineering, Tohoku University
- 2000
- Graduated from Computer and Mathematical Sciences, Graduate School of Information Sciences, Tohoku University
Keyword
- Computer Architecture
- Parallel Processing System
- Reconfigurable Computing
- High-Level Synthesis
- Hardware Algorithms
Research summary
To achieve high-performance computing with a supercomputer such as K computer and supercomputer Fugaku, we need to use a huge number of computing nodes in a way that they cooperate with each other using an inter-node network to communicate among them. However, the overall performance may be degraded by the considerable overhead required for global communications and synchronization among the nodes. We are developing computing accelerators to achieve large-scale processing with less performance degradation by introducing a new parallel computing model based on a "Data-Flow" model with localized communication and synchronization. Also, we are developing data-flow accelerators where custom-computing circuits are automatically generated by a high-level synthesis compiler for each target application. Such specially customized hardware structures allow us to achieve high performance processing even for those applications which conventional CPUs are not good at handling. These research results are helping advance usage of the K computer, as well as aiding exploration of new computing models and new architectures for future supercomputers.
Main research results
Low-power and high-performance numerical computing using our own hardware compiler to generate custom-computing acceleration
As the advancement of semiconductor technology based on Moore’s Law slows down, it will be difficult to improve computing performance with multi-core microprocessors in the near future. One of the promising solutions to solve this problem is a reconfigurable custom computing machine, where software code of a target application is converted to run on customized accelerator hardware implemented and executed with field-programmable gate arrays (FPGAs).
To date, we have developed a high-level synthesis compiler to generate stream-computing hardware modules with a data-flow computing model, as well as a system to execute high-performance computing with the generated modules implemented on FPGAs. In the case of a tsunami simulation, for instance, we achieved two times higher sustained performance and an eight-fold improvement in power performance using FPGAs compared with GPUs. These improvements were achieved by employing efficient subsystem structures tailored to the target application, including customized memory subsystems and data-paths with increased pipelines. In addition, we have developed a real-time data-compression hardware module using multiple FPGAs to enhance memory and network bandwidth for high performance computing.
Going forward, we will further advance these developments, and also develop a new system to easily achieve high performance with massively-large-scale and complex computers. We aim to establish a new computing model and architecture for high-performance computing in the Post-Moore era.

Representative papers
- Artur Podobas, Kentaro Sano, and Satoshi Matsuoka.:
"A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective"
IEEE Access, Vol.8, pp.146719-146743, DOI:10.1109/ACCESS.2020.3012084, 2020. - Artur Podobas, Kentaro Sano, and Satoshi Matsuoka.:
"A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures"
Proceedings of the 31st IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp.1-8, DOI: 10.1109/ASAP49362.2020.00010, 2020. - Antoniette Mondigo, Tomohiro Ueno, Kentaro Sano, and Hiroyuki Takizawa.:
"Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs"
IEICE Transactions on Information and Systems(Special Section on Reconfigurable Systems), Vol.E102-D, No.5, pp.1029-1036 (2019). - Antoniette Mondigo, Kentaro Sano, and Hiroyuki Takizawa.:
"Performance Estimation of Deeply Pipelined Fluid Simulation on Multiple FPGAs with High-speed Communication Subsystem"
Proceedings of 29th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 4 pages (2018). - Tomohiro Ueno, Kentaro Sano, and Takashi Furusawa.:
"Performance Analysis of Hardware-Based Numerical Data Compression on Various Data Formats"
Proceedings of the Data Compression Conference (DCC), pp.345-354 (2018). - Kentaro Sano and Satoru Yamamoto.:
"FPGA-based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks"
IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol.28, Issue.10, pp.2823-2837 (2017). - Tomohiro Ueno, Kentaro Sano, and Satoru Yamamoto.:
"Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing"
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol.10, No.3, Article No.18 (2017). - Kohei Nagasu, Kentaro Sano, Fumiya Kono, and Naohito Nakasato.:
"FPGA-based Tsunami Simulation: Performance Comparison with GPUs, and Roofline Model for Scalability Analysis"
Journal of Parallel and Distributed Computing, Vol.106, pp.153-169 (2017). - Kentaro Sano, Yoshiaki Hatsuda and Satoru Yamamoto.:
"Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory-Bandwidth"
IEEE Transactions on Parallel and Distributed Systems (TPDS), vol.25, no.3, DOI: 10.1109/TPDS.2013.51, pp.695-705 (2014). - Kentaro Sano, Wang Luzhou, Yoshiaki Hatsuda, Takanori Iizuka and Satoru Yamamoto.:
"FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations based on Finite Difference Methods"
ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol.3, no.4, article no.21, DOI:10.1145/1862648.1862651, (2010).