RI-MP2 calculations for the world-record-scale nanographene dimer (C_{150}H_{30})_{2} could be computed in 12 minutes and 39 seconds using the K computer with 80,199 nodes and 641,592 cores at the effective performance of 3.1 PFLOPS(30% of the peak performance).

- Energy calculation for a two-layer nanographene dimer, (C
_{150}H_{30})_{2} - Size: 360 atoms, 9,840 atomic orbitals

- Algorithm: RI-MP2 (Resolution of Identity – Second-order Møller–Plesset perturbation method)
- Basis function: cc-pVTZ
- Libraries used: Fujitsu SSLII (BLAS，LAPACK，ScaLAPACK)，SMASH1.2，Libint 1.1.5

- Machine: The K computer
- Specifications

- Optimization of calculation: DGEMM(Level 3 of BLAS) was applied by rewriting the four centers two-electron integrals (O(N5)),used in the MP2 perturbation calculation, into the matrix-matrix product,
- Optimization of communication: Implementation of hybrid parallelization with MPI (Message Passing Interface) and OpenMP (Open Multi-Processing). In particular, for MPI Parallel, we propose a new parallel algorithm (biaxial parallel method applied to integral calculations) that can provide good scaling for all the K computer nodes. In this way, large-scale electronic correlation calculations (RI-MP2/cc-pVTZ) were achieved.

- Maximum used nodes: 80,199 nodes and 641,592 cores
- Execution time: 12 minutes 39 seconds
- Effective performance: 3.1 PFLOPS
- Ratio to peak performance: 30%
- Parallelization ratio: 99.99968%
- Scalability: It is shown in the figure below.

- When 8,911 nodes are used, the calculation can be executed in 45 minutes with the ratio to peak performance of 62%.
- (C
_{96}H_{24})_{2}(240 atoms and 6432 orbitals): With 32,768 cores, the calculations can be executed in 11 minutes and 57 seconds. - C
_{240}fullerene (240 atoms and 7,600 orbitals): The calculations can be executed in 13 minutes.

Molecule pincette (buckycatcher): RI-MP2 energy differential calculations for C_{60}@C_{60}H_{28} using biaxial paralleled NTChem were computed in 15 minutes and 55 seconds with high parallel calculation using 65,536 cores.

- Molecular pincette (buckycatcher): Energy differential calculation for C
_{60}@C_{60}H_{28} - Size: 148 atoms and 3,884 atomic orbitals

- Algorithm: RI-MP2 (Resolution of Identity – Second-order Møller–Plesset perturbation method) gradient
- Basis function: def2-TZVP
- Libraries used: Fujitsu SSLII (BLAS，LAPACK，ScaLAPACK)，SMASH1.2，Libint 1.1.5

- Machine: the K computer
- Specifications

- Optimization of calculation: DGEMM(Level 3 of BLAS) was applied by rewriting the four centers two-electron integrals (O(N5)),used in the MP2 perturbation calculation, into the matrix-matrix product,
- Optimization of communication: Implementation of hybrid parallelization with MPI (Message Passing Interface) and OpenMP (Open Multi-Processing). In particular, for MPI Parallel, we propose a new parallel algorithm (biaxial parallel method applied to integral calculations) that can provide good scaling for all the K computer nodes. In this way, large-scale electronic correlation calculations (RI-MP2/cc-pVTZ) were achieved.

- Maximum used nodes: 8,192 nodes and 65,536 cores
- Execution time: 15 minutes and 55 seconds
- Parallelization ratio: 99.9981%
- Scalability: It is shown in the figure below.

To carry out a Quantum Monte Carlo method calculation for HCl molecules, the diffusion Monte Carlo method was applied using 8,192 cores and a parallelization efficiency of 84% was achieved.

- Calculation of the statistical physical quantity of HCl molecules using a Quantum Monte Carlo method.
- Size: 2 atoms, 819,200 Variational Monte Carlo (VMC) samples, and 409,600 Diffusion Monte Carlo (DMC) samples

- – Algorithm: VMC method, DMC method
- – Basis function: The Restricted Hartree-Fock (RHF) wave function with the cc-pVTZ-DK basis set
- – Libraries used: Fujitsu SSLII (BLAS, LAPACK, ScaLAPACK), Libint 1.1.5

- Machine: the K computer
- Specifications

- Optimization of calculation: Molecular Orbital (MO)calculations were simultaneously performed and DGEMM(level 3 BLAS) was applied.
- Optimization of communication: MPI parallelized by simple distributing walkers (parallel of ensemble).

- Maximum used nodes: 1,024 nodes and 8,192 cores
- Parallelization ratio: 99.9977%
- Scalability: It is shown in the figure below.

- Parallelization efficiency of 90% for OsO
_{4,}using the VMC method with 25,600 cores.