Usage on Intel PC cluster machines
In this page, we show an example of the usage of GENESIS on Intel PC cluster machines. Because the actual usage depends on the machine environment, the following protocol may not be directly applicable to your system. Here, we assume that the machine’s IP address is 192.168.1.2, each computational node has 16 CPU cores (2 CPUs and 8 cores in one CPU), gridengine or its variant is installed as a job scheduler, and the user’s MPI environment is set by mpi-selector-menu
.
without GPGPU
The followings are example of the installation scheme, batch script file for hybrid MPI/OpenMP computation with 16 MPI processes and 4 OpenMP threads (64 CPU cores in total), and a command to submit a job.
Installation
# Login the machine
$ ssh 192.168.1.2
# select one proper option (e.g., ib-openmpi-1.10.1_intel-15.0.4_cuda-6.5)
$ mpi-selector-menu
# login again to update MPI setting
$ exit
$ ssh 192.168.1.2
# install genesis
$ cd /swork/user/genesis
$ ./configure
$ make
$ make install
Batch script
Example 1 (Recommended):
#$ -S /bin/bash #$ -cwd #$ -pe ompi 64 #$ -V #$ -q nogpu.q BINDIR=/swork/user/genesis/bin mpirun -machinefile $TMP/machines -np 16 -npernode 4 -npersocket 2 -x OMP_NUM_THREADS=4 ${BINDIR}/spdyn INP > md.log
if “-npernode X
” does not work, use “--bind-to socket
“.
Example 2:
#$ -S /bin/bash #$ -cwd #$ -pe ompi 64 #$ -V #$ -q nogpu.q export OMP_NUM_THREADS=4 BINDIR=/swork/user/genesis/bin mpirun -machinefile $TMP/machines -np 16 -npernode 4 -npersocket 2 ${BINDIR}/spdyn INP > md.log
In the batch script,
red: Total number of CPU cores to be used
orange: Number of OpenMP threads
blue: Number of MPI processors
green: Number of MPI processors in each node
magenta: Number of MPI processors in one CPU
purple: queue name
and relationships between these values are
red = orange * blue
total number of CPU cores in one node = orange * green
total number of CPU cores in one CPU = orange * magenta
Usage
# Execute run1.sh to run3.sh sequentially
$ qsub -N R1 run1.sh
$ qsub -N R2 -hold_jid R1 run2.sh
$ qsub -N R3 -hold_jid R2 run3.sh
# Check running jobs
$ qstat -f
# delete a job
$ qdel JOB_ID
After submitting a job, the user should check whether the specified CPU resources are fully utilized by running the top
command in some of the computational nodes. If the CPUs are not fully used, the option of mpirun
might not be appropriate. In some cases, --bind-to socket
may have to be be used instead of -npernode
.
with GPGPU
The followings are example of the installation scheme and batch script file for hybrid MPI/OpenMP computation with 16 MPI processes and 4 OpenMP threads with 2 GPU cards/node (64 CPU cores + 8 GPU cards in total).
Installation
# Login the machine
$ ssh 192.168.1.2
# select one proper option (e.g., ib-openmpi-1.10.1_intel-15.0.4_cuda-6.5)
$ mpi-selector-menu
# login again to update MPI setting
$ exit
$ ssh 192.168.1.2
# install genesis
$ cd /swork/user/genesis
$ ./configure --enable-gpu --enable-single
$ make
$ make install
Batch script
GENESIS 1.1.2 or later
Number of MPI processors and OpenMP threads are specified in run.sh
.
run.sh
:
#$ -S /bin/bash #$ -cwd #$ -pe ompi 64 #$ -V #$ -q gpu.q export OMP_NUM_THREADS=4 BINDIR=/swork/user/genesis/bin mpirun -machinefile $TMP/machines -np 16 -npernode 4 -npersocket 2 ${BINDIR}/spdyn INP > md.log
Number of GPU cards will be automatically detected. If you wanna exclude some specific GPU card or wanna use only a single specific GPU card among multiple cards, please check this page.
GENESIS 1.1.1 or before
Openrun.sh
, and number of GPU cards are in wrap.sh
.
run.sh
:
#$ -S /bin/bash #$ -cwd #$ -pe ompi 64 #$ -V #$ -q gpu.q export OMP_NUM_THREADS=4 BINDIR=/swork/user/genesis/bin mpirun -machinefile $TMP/machines -np 16 -npernode 4 -npersocket 2 ./wrap.sh ${BINDIR}/spdyn INP > md.log
In those batch scripts,
red: Total number of CPU cores to be used
orange: Number of OpenMP threads
blue: Number of MPI processors
green: Number of MPI processors in each node
magenta: Number of MPI processors in one CPU
purple: queue name
and relationships between these values are
red = orange * blue
total number of CPU cores in one node = orange * green
total number of CPU cores in one CPU = orange * magenta
wrap.sh
:
#!/bin/bash
lr=${OMPI_COMM_WORLD_LOCAL_RANK:-0}
gpuid=`expr ${lr} \% 2`
export CUDA_VISIBLE_DEVICES=${gpuid}
$@
where red is the number of GPU cards attached in one node.
Usage of GPU nodes is same with no GPU case. Please see the above explanation.