3.1.8.3.7.4. ハイブリッド並列¶

マルチノードジョブ（ハイブリッド並列）のCプログラムをコンパイルする例を示します。

ソースプログラムを用意します。

サンプルプログラムを/home/system/sample/C/mpi/trad_sample_hybrid.cとして用意しています。

（サンプルコードのダウンロード）

#include <stdio.h>
#include "mpi.h"
#define SIZE 9000

int
main(int argc, char *argv[])
{
        int     rank, size, root;
        int     data, result;
        int     i,j;
        double  a[SIZE][SIZE],b[SIZE][SIZE],c[SIZE][SIZE];

        result = 0;

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);

        for(i=0; i < SIZE; i++){
                for(j=0; j < SIZE; j++){
                        a[i][j] = (double)(i+j*0.5);
                        b[i][j] = (double)(i+j/(rank+1));
                        c[i][j] = a[i][j] + b[i][j];
                }
        }

        data = c[1][1]/(rank+1);

        if (rank == 0) {
                fprintf(stdout, "MPI communication start. size=%d\n", size);
                fflush(stdout);
        }

        root = 0;
        MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, root, MPI_COMM_WORLD);

        if (rank == 0) {
                fprintf(stdout, "MPI communication end\n");
                fprintf(stdout, "result(%d)\n",result);
                fflush(stdout);
        }

        MPI_Finalize();
        return 0;
}

サンプルプログラムをコンパイルします。

[_LNlogin]$ mpifccpx -V -Kfast,parallel,optmsg=2 -o sample_mpi trad_sample_hybrid.c
fccpx: Fujitsu C/C++ Compiler 4.1.0 tcsds-1.2.24
simulating gcc version 6.1
ccpcompx: Fujitsu C/C++ Compiler 4.1.0 (Feb 26 2020 07:47:51)
Parallelization messages
  jwd5001p-i  "trad_sample_hybrid.c", line 19: This loop with loop variable 'i' is parallelized.
  jwd6001s-i  "trad_sample_hybrid.c", line 20: SIMD conversion is applied to this loop with the loop variable 'j'.
  jwd8220o-i  "trad_sample_hybrid.c", line 20: Optimizations that may cause side effect are applied.
  jwd8204o-i  "trad_sample_hybrid.c", line 20: This loop is software pipelined.
  jwd8205o-i  "trad_sample_hybrid.c", line 20: The software-pipelined loop is chosen at run time when the iteration count is greater than or equal to 48.
GNU assembler version 2.30 (aarch64-linux-gnu) using BFD version version 2.30-49.el7
GNU ld version 2.30-49.el7
  Supported emulations:
   aarch64linux
   aarch64elf
   aarch64elf32
   aarch64elf32b
   aarch64elfb
   armelf
   armelfb
   aarch64linuxb
   aarch64linux32
   aarch64linux32b
   armelfb_linux_eabi
   armelf_linux_eabi
   i386pep
   i386pe

ジョブスクリプトを用意します。

ジョブスクリプトのサンプルは/home/system/sample/C/mpi/trad_job_mpi.shとして用意しています。

（サンプルコードのダウンロード）

#!/bin/sh
#PJM -L "node=2"
#PJM -L "rscgrp=small"
#PJM -L "elapse=10:00"
#PJM --mpi max-proc-per-node=4
#PJM -x PJM_LLIO_GFSCACHE=/vol000N
#PJM -g groupname
#PJM -s

# execute job
export OMP_NUM_THREADS=12
mpiexec -n 8 ./sample_mpi

pjsubコマンドでジョブを投入します。

[_LNlogin]$ pjsub trad_job_mpi.sh
[INFO] PJM 0000 pjsub Job 26 submitted.

実行結果を確認します。

標準出力はジョブ名.ジョブID.outとして出力されます。

[_LNlogin]$ cat trad_job_mpi.sh.26.out
MPI communication start. size=8
MPI communication end
result(4)

3.1.8.3.7.4. ハイブリッド並列¶

前のトピックへ

次のトピックへ