The supercomputer Fugaku will consist of 10 million compute nodes, each of which has a so-called manycore CPU in which many CPU cores are integrated. In the Post-K (Fugaku) development project, several challenging issues are tackled in order to provide an efficient execution environment by the system software. Three of them are introduced as follows.
Issues on managing manycore CPU
The execution time of an application depends on how CPU cores and memory are allocated to both the application and system software, and it also depends on what kind of services, provided by the system software, are requested by the application. We are developing new system software that provides the best execution environment for an application with selecting system software services. The application program, running on the Linux system software, may run without any modifications.
Issues on communication latency
It is expected that communication latency will not be drastically improved unlike performance improvement of a compute node. Thus, communication latency will become dominant in the total execution time of an application. We are designing new communication software in cooperation with application developers to reduce the latency.
Issues on big data movement
In many traditional huge supercomputers, such as K computer, a data set used in an application is transferred from the global storage to the local storage, and from the local storage to compute nodes. After finishing the application execution, the result data set is transferred from compute nodes to the global storage via the local storage. To reduce the data transfer time in such a system, high performance network is needed, but it is expensive and requires much electricity. We are designing new system software to allocate the data set for an application in a right place and to reduce the data transfer time without extra network hardware resources.