RIKEN Center for Computational Science

Menu
Menu
Supercomputer Fugaku

Outline of the Development of the Supercomputer Fugaku

The supercomputer Fugaku will be developed based on the following guiding principles:

  • Top priority on problem-solving research
    During development, highest priority will be given to creating a system capable of contributing to the solution of various scientific and societal issues. For this, the hardware and software will be developed in a coordinated way (Co-design), with the aim to make it usable in a variety of fields.
  • World-leading performance
    Create the most advanced general-use system in the world.
  • Improve performance through international cooperation
    While leveraging Japan’s strengths, cooperate internationally to achieve world-leading technologies of the highest quality and become the international standard.
  • Continue the legacy of the K computer
    Make the fullest use of the technologies, human resources, and applications of the K computer project for developing the Fugaku system.


Number of Nodes

Number of Nodes 158,976 nodes
384 nodes x 396 racks = 152,064
192 nodes x 36 racks = 6,912

Peak Performance

Peak Performance Normal Mode:
2.0 GHz
  • Double Precision (64 bit) 488 Petaflops
  • Single Precision (32 bit) 977 Petaflops
  • Half Precision (16 bit) 1.95 Exaflops
  • Integer (8 bit) 3.90 Exa Exaops
Boost Mode:
2.2 GHz
  • Double Precision (64 bit) 537 Petaflops
  • Single Precision (32 bit) 1.07 Exaflops
  • Half Precision (16 bit) 2.15 Exaflops
  • Integer (8 bit) 4.30 Exaops
Total Memory 4.85 PiB
Total Memory Bandwidth 163 PB/s

Node

Architecture Armv8.2-A SVE 512bit
With the following Fujitsu's extensions: Hardware barrier, Sector cache, and Prefetch
Core 48 cores for compute and 2 or 4 cores for OS activities
4 CMGs (NUMA nodes)
Performance Normal Mode:
2.0 GHz
DP: 3.072 TF, SP: 6.144 TF, HP: 12.288 TF
Boost Mode:
2.2 GHz
DP: 3.3792 TF, SP: 6.7584 TF, HP: 13.5168 TF
Cache*1 *2 L1D/core: 64 KiB, 4way, 256 GB/s (load), 128 GB/s (store)
L2/CMG: 8 MiB, 16way
L2/node: 4 TB/s (load), 2 TB/s (store)
L2/core: 128 GB/s (load), 64 GB/s (store)
Memory HBM2 32 GiB, 1024 GB/s
Interconnect Tofu Interconnect D (28 Gbps x 2 lane x 10 port)
I/O PCIe Gen3 x16
Technology 7nm FinFET

*1 Performance at 2 GHz
*2 For more information, please refer to https://github.com/fujitsu/A64FX .


SVE: Scalable Vector Extension

CPU Die (Courtesy of FUJITSU LIMITED)
*Click to enlarge



Storage System

1st Layer
  • Cache for global file system
  • Temporary file systems
    • Local file system for compute node
    • Shared file system for a job
2nd Layer
  • Lustre-based global file system
3rd Layer
  • Cloud storage services (in preparation)


Programming Language and Library

Compiler Fortran2008 & Fortran2018 subset
C11 & GNU and Clang extensions
C++14 & C++17 subset and GNU and Clang extensions
OpenMP 4.5 & OpenMP 5.0 subset
Java
Parallel Programming XcalableMP
FDPS
Script Language Python + Numpy + Scipy, Ruby
Math Library BLAS, LAPACK, ScaLAPACK
SSL II (Fujitsu)
EigenExa, Kevd, Batched BLAS, 2.5D-PDGEMM


System Software

OSRed Hat Enterprise Linux 8
McKernel
MPIFujitsu MPI (Based on OpenMPI), RIKEN-MPICH (Based on MPICH)
File IOLLIO
Application-oriented file IO libraries