TOP    Fugaku    About Fugaku

The system configuration of the supercomputer Fugaku is described below.

Total number of nodes

The total number of nodes in Fugaku is 158, 976.

A single CPU makes up a node, and two CPUs (two nodes) are mounted on a board called the CPU Memory Unit (CMU). Eight CMUs make up a "Bunch of Blades (BoB)," which means each BoB has 16 nodes. Three BoBs make up a Shelf, and therefore each Shelf has 48 nodes. Eight Shelves (384 nodes) are installed in a computer rack (some racks have 192 nodes). Fugaku is made up of 432 racks, of which 396 racks have 384 nodes, and 36 racks have 192 nodes. This makes a total of 158,976 nodes.

Peak Theoretical Performance

Fugaku can be run in a normal mode (CPU clock speed of 2 GHz) and a boost mode (CPU clock speed of 2.2 GHz), and the peak theoretical performance in each mode is summarized in the table below.
The high memory bandwidth is also one of the features of Fugaku.

Peak Theoretical Performance
Peak performance
Normal mode(CPU clock speed: 2 GHz)
  • Theoretical peak double precision performance (64 bit) 488 PFlops
  • Theoretical peak single precision performance (32 bit) 977 PFlops
  • Theoretical peak half precision (AI learning) performance (16 bit) 1.95 EFlops
  • Theoretical peak integer (AI inference) performance (8 bit) 3.90 EOps
Peak performance
Boost mode (CPU clock speed: 2.2 GHz)
  • Theoretical peak double precision performance (64 bit) 537 PFlops
  • Theoretical peak single precision performance (32 bit) 1.07 EFlops
  • Theoretical peak half precision (AI learning) performance (16 bit) 2.15 EFlops
  • Theoretical peak integer (AI inference) performance (8bit) 4.30 EOps
Total memory 4.85 PiB
Total memory bandwidth 163 PB/s

Single node performance

The single node performance is summarized below. The unique chip is based on the Arm instruction set architecture.

Single node performance
Instruction set architecture Armv8.2-A SVE 512 bit
Fujitsu extension: hardware barrier, sector cache, prefetch
Number of core 48 + 2 assistant cores
4 CMG (Core Memory Group, NUMA node)
Performance
Normal mode (CPU clock speed: 2 GHz)
Double precision: 3.072 TF; single precision: 6.144 TF; half-precision: 12.288 TF
Performance
Boost mode(CPU clock speed: 2.2 GHz)
Double precision: 3.3792 TF; single precision: 6.7584 TF; half-precision: 13.5168 TF
Cache *1 *2 L1D/core: 64 KiB, 4way, 256 GB/s (load), 128 GB/s (store)

L2/CMG: 8 MiB, 16way
L2/node: 4 TB/s (load), 2 TB/s (store)
L2/core: 128 GB/s (load), 64 GB/s (store)
Memory HBM2 32 GiB, 1024 GB/s
Interconnect Tofu Interconnect D (28 Gbps x 2 lane x 10 port)
I/O PCIe Gen3 x16
Technology 7 nm FinFET
  • 1 Cache performance is with the CPU clock speed of 2 GHz
  • 2 Please refer to GitHubThe webpage will open in a new tab. for details
CPU-Die (Image courtesy of Fujitsu) (*) Click to enlarge

CPU-Die (Image courtesy of Fujitsu)

Tofu Interconnect D

The 6D mesh/torus interconnect is used for communication between nodes. Low latency and high throughput are achieved by Remote Direct Memory Access (RDMA).

  • Latency with 8 B input: 0.49 - 0.54 micro-sec
  • Throughput with 1 MiB input: 6.35 GB/s
TofuD Interconnect structure
TofuD Interconnect structure
Six Tofu Network Interfaces (TNIs, each performs at 6.8 GB/s), 40.8 GB/s in total, are connected to the Tofu network router with 10 ports with 2 lanes each (20 lanes in total).

Tofu Cable

About 200,000 Tofu cables connect the CPUs of Fugaku. About half of them are fiber optic cables that run between racks, the total length of which is about 900 km.

Number Total length (m)
Fiber optic cables

97,632

855,101

Electrical cables

119,232

38,552

Total

216,864

893,653

Another approx. 10,000 cables connect the racks to the storage, etc.

Storage

1st Layer

  • LLIO (Lightweight Layered IO-Accelerator)
    LLIO is a file system dedicated to the job execution area.
    Providing the following three types of areas to the job.
    ・Node Temporary Area
    ・Shared Temporary Area
    ・Cache Area of 2nd Layer Storage

2nd Layer

  • FEFS (Fujitsu Exabyte File System)
    FEFS is a large shared area used by users and jobs.
    Luster-based file system.

3rd Layer

  • Commercial cloud storage services

I/O Network

Programming environment

Compiler Fortran 2008 and Fortran 2018
C11 with GNU and Clang extensions
C++14 and C++17 with GNU and Clang extensions
OpenMP 4.5 and OpenMP 5.0
Java
Parallel programing XcalableMP [Details of XcalableMP (PDF 535 KB)The PDF will open in a new tab.]
FDPS [Details of FDPS (PDF 260 KB)The PDF will open in a new tab.]
Script language Python / Numpy / Scipy, Ruby
Numerical library BLAS, LAPACK, ScaLAPACK
SSL II (Fujitsu)
EigenExa, Batched BLAS, 2.5D-PDGEMM

System software

Open-source management tool Spack [Details of Spack (PDF 355 KB)The PDF will open in a new tab.]
Container, virtual machine Singularity, KVM
OS Red Hat Enterprise Linux 8
McKernel [Details of McKernel (PDF 641 KB)The PDF will open in a new tab.]
MPI Fujitsu MPI (Based on OpenMPI),
MPICH-Tofu (Based on MPICH) [Details of MPICH-Tofu (PDF 404 KB)The PDF will open in a new tab.]
File IO LLIO
DTF (Data Transfer Framework) [Details of DTF (PDF 220 KB)The PDF will open in a new tab.]