日時: 2019年9月17日（火）、15:30 - 16:10
場所: R-CCS 6階講堂
・講演題目： Application Specific Multi-Threading for Heterogeneous Systems using High-Level-Synthesis from C code
・講演者： Jens Huthmann（アーキテクチャ研究チーム 特別研究員）
The performance improvement of conventional processor has begun to stagnate in recent years. Because of this, researchers are looking for new possibilities to improve the performance of computing systems. Heterogeneous systems turned out to be a powerful possibility. In the context of this talk, a heterogeneous system consists of a software-programmable processor and a FPGA based configurable hardware accelerator.
Due to their increased complexity, it is more complicated to develop applications for heterogeneous systems than for conventional systems based on a software-programmable processor. For programming the software and hardware parts, different languages have to be used and additional specialised hardware-knowledge is required. Both factors increase the development cost.
This work presents the compiler framework Nymble which allows to program a heterogeneous system with only a single high-level language. In the high-level language the developer only has to select which parts of the application should be executed in hardware. Nymble then generates a program for the software-processor, the configuration of the hardware, and all interfaces between software and hardware.
To hide long memory access latencies, this talk presents an execution model which allows the simultaneous execution of multiple threads in a single accelerator. Additionally, the model enables threads to be dynamically reordered at specific points in the common accelerator pipeline. This capability is used to let other (non-waiting) threads overtake a thread which is waiting for a memory access. Thus, these other threads can execute their calculations independently of the waiting thread to bridge the latency of memory accesses.
The presented execution model dynamically spreads multiple threads over the pipeline. This results in a higher utilisation of the resources by using resources more effectively. Furthermore, the simultaneous execution of multiple threads can achieve similar throughput as multiple copies of a single-threaded accelerator running in parallel.
It makes it possible to combine the improved throughput of multiple copies with the increased efficiency of simultaneous threads in a single accelerator. Thread reordering allows the new model to be effectively used with a cached shared-memory.
In comparison, between four copies of a single-threaded accelerator and a multi-thread accelerator with four thread (both created by Nymble), a resource efficiency of up to factor 2.6x can be achieved. At the same time, four simultaneous threads can be up to 4x as fast as four threads executed consecutively on a single accelerator. Compared to other, more optimised compilers, Nymble can still achieve up to 2x faster runtime with 1.5x resource efficiency.
日時: 2019年9月17日（火）、16:10 - 16:50
場所: R-CCS 6階講堂
・講演題目： Using Field-Programmable Gate Arrays to Explore Different Numerical Representation: A Use-Case on POSITs
・講演者： Artur Podobas (プロセッサ研究チーム 特別研究員)
The inevitable end of Moore’s law motivates researchers to re-think many of the historical architectural decisions. Among these decisions we find the representation of floating-point numbers, which has remained unchanged for nearly three decades. Chasing better performance, lower power consumption or improved accuracy, researches today are actively searching for smaller and/or better representations. Today, a multitude of different representations are found in the specialized (e.g. Deep-Learning) applications as well as for general-purpose applications (e.g. POSITs).
However, despite their claimed strengths, alternative representations remain difficult to evaluate empirically. There are software approaches and emulation libraries available, but their sluggishness only allows the smallest of inputs to be evaluated and understood.
POSIT is a new numerical representation, introduced by professor John Gustafson in 2017 as a candidate to replace the traditional IEEE-754 representation. In this talk I will present my experience in designing, building and accelerating the POSIT numerical representation on Field-Programmable Gate Arrays (FPGAs). I will start by briefly introducing the POSIT representation, show its hardware implementation details, reasoning around their trade-offs (with respect to IEEE-754) and conclude the presentation with small use-cases and their measured/obtained performance.
日時: 2019年9月2日（月）、13:00 - 13:55
場所: R-CCS 6階講堂
・講演題目： Current status of FDPS/Processor design from HPC perspective
・講演者： Jun Makino（粒子系シミュレータ研究チーム、チームリーダー）
This talk will consist of two parts. In part I, I'll overview the current status of FDPS (Framework for Developing Particle Simulator). FDPS provide an easy and highly efficient way to develop parallel program for particle-based simulations, through extensive use of metaprogramming. It takes the definitions of particle class and particle-particle interaction function as input, and generates high-performance parallel libraries for domain decomposition, particle exchange and interaction calculation. Using these functions, application programmers can develop their own parallel programs, without spending much time to write and debug parallel code written in MPI. In fact, an application program written using FDPS functions contain no MPI calls and yet run on single-core, multiple cores using OpenMP or multiple nodes using MPI (or hybrid OpenMP-MPI) without the need of the change in the source code. After the initial release of FDPS in 2015, we have added many additional functionalities and performance improvement, which I'll overview in this talk. In the second part, I'll discuss how one can design the processors architecture which is "optimal" in some well-defined meaning.
日時: 2019年9月2日（月）、13:55 - 14:50
場所: R-CCS 6階講堂
・講演題目： Recent achievements and future plans in the computational molecular science research team
・講演者： Takahito Nakajima（量子系分子科学研究チーム、チームリーダー）
We will give a talk on the recent achievements and future plans in the computational molecular science research team. In particular, we will introduce materials design of hole-transporting materials (HTMs) for perovskite solar cells. In this study, the efficient search of optimum HTMs was achieved by applying machine learning techniques. We employed the deep neural network to predict the power conversion efficiency of perovskite solar cells with HTMs by utilizing molecular descriptors as input features. We also employed the Gaussian process regression to evaluate the acquisition function in Bayesian optimization and implement uncertainty and reliability to the prediction model. Discrete particle swarm optimization was applied to tackle the optimization problem in the vast chemical space. In addition, we will introduce the future development of the solar-cell simulator based on the dynamic Monte Carlo approach with the first-principles calculation.
日時: 2019年8月22日（木）、14:00 - 14:40
場所: R-CCS 6階講堂
・講演題目： The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?
・講演者： Jens Domke（高性能ビッグデータ研究チーム）
The state-of-the-art topology for modern supercomputers are Folded Clos networks, a.k.a. Fat-Trees. The node count in these massively parallel systems is steadily increasing. This forces an increased path length, which limits gains for latency-sensitive applications. A novel, yet only theoretically investigated, alternative is the low-diameter HyperX. To perform a fair side-by-side comparison between a 3-level Fat-Tree and a 12x8 HyperX, we constructed the world’s first 3 Pflop/s supercomputer with these two networks. We show through a variety of benchmarks that the HyperX, together with our novel communication pattern-aware routing, can challenge the performance of traditional Fat-Trees.
日時: 2019年8月22日（木）、14:40 - 15:20
場所: R-CCS 6階講堂
・講演題目： Modelizing communication hiding for high-performance strong scaling
・講演者： Masatoshi Kawai（利用高度化研究チーム）
Communication hiding is a well-known approach for realizing the higher performance of applications. Especially, to get ideal parallel performance with strong scaling, effective communication hiding essential. However, in some applications, we can not hide communications as we expect. In this study, we modelize communication hiding for each application and system. By this model, we judge the communication hiding is useful or not. In this talk, we discuss the validity of the model with general stencil problems.
日時: 2019年8月5日（月）、13:00 - 13:55
場所: R-CCS 6階講堂
・講演題目： Large-scale simulation of cortico-cerebello-thalamo-cortical circuit on the K computer
・講演者： 五十嵐 潤（情報システム本部計算工学応用開発ユニット）
Whole-brain simulation allows us to understand full interactions among neurons and helps elucidate brain function and disease. However, it has not been realized due to the insufficient computational power of current supercomputers and lack of experimental data of the brain.
In this study, we propose an efficient and scalable parallelization method for whole-brain simulation executed on next-generation supercomputers. We focus on the biological features of the brain that major brain parts of the cortex and cerebellum form layered sheet structure with local-dense and remote-sparse connections. To exploit the biological features, our proposed method combines tile-partitioning method and communication method using synaptic transmission delay. Our proposed method showed good weak scaling performance for simulation of the cortex, cerebellum, and cortico-cerebello-thalamo-cortical circuits on the K computer. These results suggest that the size of the model may scale to human brain size on Fugaku computer. The whole-brain simulation on next-generation supercomputers may lead to a new paradigm of brain research.
日時: 2019年8月5日（月）、13:55 - 14:50
場所: R-CCS 6階講堂
・講演題目： Recent Progress on Big Data Assimilation in Numerical Weather Prediction
・講演者： 三好 建正（データ同化研究チーム、チームリーダー）
The Big Data Assimilation (BDA) project in numerical weather prediction (NWP) started in October 2013 under the Japan Science and Technology Agency (JST) CREST program, and ended its 5.5-year period in March 2019. The direct follow-on project was accepted and started in April 2019 under the JST AIP Acceleration Research, with emphases on the connection with AI technologies, in particular, an integration of DA and AI with high-performance computation (HPC). The BDA project aimed to fully take advantage of “big data” from advanced sensors such as the phased array weather radar (PAWR) and Himawari-8 geostationary satellite, which provide two orders of magnitude more data than the previous sensors. We have achieved successful case studies with newly-developed 30-second-update, 100-m-mesh NWP system based on the RIKEN’s SCALE model and local ensemble transform Kalman filter (LETKF) to assimilate PAWR in Osaka and Kobe. We have been actively developing the workflow for real-time weather forecasting. In addition, we developed two precipitation nowcasting systems with the every-30-second PAWR data: one with an optical-flow-based system, the other with a deep-learning-based system. We chose the convolutional Long Short Term Memory (Conv-LSTM) as a deep learning algorithm, and found it effective for precipitation nowcasting. The use of Conv-LSTM would lead to an integration of DA and AI with HPC. This presentation will include an overview of the BDA project toward a DA-AI-HPC integration under the new AIP Acceleration Research scheme, and recent progress of the project.
日時: 2019年8月5日（月）、15:05 - 16:00
場所: R-CCS 6階講堂
・講演題目： Extending Supercomputers with FPGA-based Custom Computing Machine
・講演者： 佐野 健太郎（プロセッサ研究チーム、チームリーダー）
Custom computing with dedicated circuits on FPGAs (Field-Programmable Gate Arrays) is promising to accelerate computation that general-purpose multi-core processors are not good at. In our team, we have developed a system with Intel's 14nm
Stratix10 FPGAs and a data-flow compiler which generates a pipelined custom hardware module to be embedded onto an FPGA and executed as stream computing.
In this talk, I introduce the system including FPGA's dedicated network subsystem, the data-flow compiler, and expected applications to be off-loaded to FPGAs, as well as challenges to be tackled in our project. Finally, I discuss what a future supercomputer should be in the Post-Moore era.
Dr. Kentaro Sano received his Ph.D. from GSIS, Tohoku University, in 2000.
Since 2000 until 2005, he had been a Research Associate at Tohoku University.
Since 2005 until 2018, he has been an Associate Professor at Tohoku University.
He was a visiting researcher at the Department of Computing, Imperial College, London, and Maxeler corporation in 2006 and 2007. Since 2017 until present, he has been a team leader of a processor research team at R-CCS, Riken.
His research interests include FPGA-based high-performance reconfigurable computing systems especially for scientific numerical simulations and machine learning, high-level synthesis compilers and tools for reconfigurable custom computing machines, and system architectures for next-generation supercomputing based on the data-flow computing model.
日時: 2019年8月5日（月）、16:00 - 16:55
場所: R-CCS 6階講堂
・講演題目： Importance of turbulence process with cloud ~ Toward global large eddy simulation model ~
・講演者： 富田 浩文（複合系気候科学研究チーム、チームリーダー）
In K computer era, the global cloud resolving model was established to some extent. Computational Climate Research Team aims now development of a next-generation climate model with super-high resolution, which explicitly resolves the phenomena, based on more principle theoretical modeling. In such model, a key process is expression of turbulence in rotating stratified fluid. Furthermore, it tightly interacts with cloud condensation. After brief introduction of the team aim, we give a talk about the past achievement and future plan, focusing on the turbulence modeling.