Research Topics

Home > Research Topics

Efficient simulation code for supercomputer Fugaku: use of uTofu interface

Lattice QCD is a typical network bandwidth limited application. The most time consuming part is a linear solver to solve a discretized Dirac equation to which an iterative method is used. The kernel is a 9 point stencil computation on 4-dimensional structured lattice, for which a trivial parallelization of lattice site degrees of freedom can be applied. Since frequent halo exchange commutation is required, it is crucially important to reduce the overhead of the communication as well as to saturate the communication bandwidth.

For this purpose, QCD Wide SIMD library (QWS) [1], which was developed as a part of co-design activity for supercomputer Fugaku, uses the uTofu interface. In QWS, RDMA put through the uTofu interface is used in the neighboring communication. It allows to specify which network interface to be used for sending the data to each direction out of 6 Tofu Network Interfaces (TNIs). This helps to keep the load balance among the TNIs and saturate the network bandwidth. The put is issued to 7 directions simultaneously in thread parallel (due the domain decomposed algorithm, it is not 8 directions). The cache injection mechanism, where the arrived data is written directly to the last level cache, is also used in QWS. The details of QWS is available in [2].

References

[1] https://github.com/RIKEN-LQCD/qws
[2] https://arxiv.org/abs/2109.10687

Lattice QCD simulation with chiral fermion

Under Construction.