3. Login¶
4. File Transfer¶
scp
command. Besides, the storage capacity and the maximum number of files are limited in the same way as in the Fugaku main system.5. Compile¶
5.1. GCC¶
The following are examples of using GCC.
Large Memory Nodes
[Login]$ srun -p mem1 -n 1 --mem 27G --time=5 --pty bash -i
[MEM]$ gcc -o a.out sample.c
GPU Nodes
[Login]$ srun -p gpu1 -n 1 --mem 2700 --time=5 --pty bash -i
[GPU]$ gcc -o a.out sample.c
Note
A program compiled on the login nodes by GCC can be executed on the pre/post environemnt.
5.2. Intel Compiler¶
5.2.1. Pre/Post-Pocessing Nodes¶
When using the Intel compiler on the pre/post environment, enter a compute node in the pre/post environment with interactive job mode and set the environment variables.
The following are examples of using the Intel compiler.
Large memory nodes
[Login]$ srun -p mem1 -n 1 --mem 27G --time=5 --pty bash -i
[MEM]$ . /opt/intel/oneapi/setvars.sh intel64
[MEM]$ icc -o a.out sample.c
GPU nodes
[Login]$ srun -p gpu1 -n 1 --mem 2700 --time=5 --pty bash -i
[GPU]$ . /opt/intel/oneapi/setvars.sh intel64
[GPU]$ icc -o a.out sample.c
Note
A program compiled on the login nodes by the Intel compiler can be executed on the pre/post environment.
5.3. NVIDIA CUDA Compiler¶
Log in to a GPU node with interactive job and set the environment variables.
The following are examples of using the CUDA compiler.
[Login]$ srun -p gpu1 -n 1 --mem 2700 --time=5 --pty bash -i
[GPU]$ export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
[GPU]$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
[GPU]$ nvcc -o a.out sample.cu
Attention
The NVIDIA CUDA compiler is only available in the GPU nodes.
To run the compiled a.out, you need to allocate GPU resources by SLURM options. (For details, please see the sections on Batch Job and subsequent topics.)
5.4. Fujitsu Compiler (cross compiler)¶
The following are examples of using the Fujitsu compiler.
[Login]$ srun -p gpu1 -n 1 --mem 2700 --time=5 --pty bash -i
[GPU]$ frtpx -o a.out sample.f90
6. Job Submission¶
6.1. Job Queue (Partition)¶
The table below shows the job queues currently available. [1]
Queue (PartitionName) |
Node (Nodes) |
Default elapsed time [h] (DefaultTime) |
Maximum elapsed time [h] (MaxTime) |
Default #CPUs per job (cpus-per-task) |
Maximum #CPUs per job (MaxTRESPerJob=cpu) |
Default memory per job (DefMemPerNode) |
Maximum memory per job (MaxMemPerNode) |
Maximum #jobs per user (MaxJobPerUser) |
Maximum #nodes per job (MaxNodes) |
---|---|---|---|---|---|---|---|---|---|
gpu1 |
pps[01-06]
|
0.5
|
3
|
1
|
72
|
2700M
|
186G
|
5
|
1
|
gpu2 |
pps[07-08]
|
0.5
|
24
|
1
|
36
|
2700M
|
93G
|
1
|
1
|
mem1 |
ppm01
|
0.5
|
3
|
1
|
224
|
27G
|
5020G [2]
|
5
|
1
|
mem2 |
ppm[02-03]
|
0.5
|
24
|
1
|
56
|
27G
|
1500G
|
1
|
1
|
ondemand-reserved [3] |
wheel[1-2]
|
0.5
|
720
|
1
|
8
|
4G
|
32G
|
50
|
1
|
Note
“gpu1” and “mem1” are prepared as a queue for short-time jobs and “gpu2” and “mem2” for long-time jobs.
When a job is submitted without specifying the amount of resources, default values are applied. Please specify the number of CPUs required, amount of memory, and elapsed time in the job script or by Slurm command options.
6.2. Batch Job¶
Basically, to submit a job to the compute nodes, the users need to make a job script and use SLURM commands on the login node.
6.2.1. Sequential Job¶
The following are examples for sequential jobs.
Large memory node (1cpu)
#!/bin/bash
#SBATCH -p mem1 # Specifying a queue
#SBATCH -n 1 # Specifying the number of CPUs
#SBATCH --mem 27G # Specifying memory usage [MB]
./a.out
Large memory node (1node)
#!/bin/bash
#SBATCH -p mem1 # Specifying a queue
#SBATCH -n 224 # Specifying the number of CPUs
#SBATCH --mem 5020G # Specifying memory usage [GB]
./a.out
GPU node (1cpu)
#!/bin/bash
#SBATCH -p gpu1 # Specifying a queue
#SBATCH -n 1 # Specifying the number of nodes
#SBATCH --mem 2700 # Specifying memory usage [MB]
./a.out
GPU node (1node)
#!/bin/bash
#SBATCH -p gpu1 # Specifying a queue
#SBATCH -n 72 # Specifying the number of CPUs
#SBATCH --mem 186G # Specifying memory usage [GB]
./a.out
6.2.2. OpenMP¶
The following are examples of using OpenMP.
Intel Compiler + OpenMP
#!/bin/bash
#SBATCH -p gpu1 # Specifying a queue
#SBATCH --cpus-per-task=72 # Specifying the number of CPUs to use for each task
export OMP_NUM_THREADS=72
export KMP_AFFINITY=granularity=fine,compact
. /opt/intel/oneapi/setvars.sh intel64
./a.out
GCC + OpenMP
#!/bin/bash
#SBATCH -p mem1 # Specifying a queue
#SBATCH --cpus-per-task=224 # Specifying the number of CPUs to use for each task
export OMP_NUM_THREADS=224
./a.out
6.2.3. MPI¶
The following are examples of using MPI.
Intel Compiler + Intel MPI
#!/bin/bash
#SBATCH -p mem1 # Specifying a queue
#SBATCH --cpus-per-task=224 # Specifying the number of CPUs to use for each task
#SBATCH -t 00:10:00 # Specifying elapsed time limits
. /opt/intel/oneapi/setvars.sh intel64
mpiexec -n 224 ./a.out
GCC + OpenMPI
#!/bin/bash
#SBATCH -p gpu1 # Specifying a queue
#SBATCH -n 72 # Specifying the number of tasks
#SBATCH -t 00:10:00 # Specifying elapsed time limits
. /vol0004/apps/oss/spack/share/spack/setup-env.sh
spack load openmpi@3.1.6%gcc@8.5.0
mpiexec --use-hwthread-cpus -n 72 --mca btl_openib_if_include mlx5_0 ./a.out
Attention
When using GCC + Open MPI, the option --use-hwthread-cpus is required if you want to use more than half of the maximum number of cores for each node.
6.2.4. GPU¶
The following are examples of using GPU.
#!/bin/bash
#SBATCH -p gpu1 # Specify a queue
#SBATCH --gpus-per-node=1 # Specify the number of GPUs to use
./a.out
To use 2GPUs simultaneously, specify the option --gpus-per-node=2
.
#!/bin/bash
#SBATCH -p gpu1 # Specify a queue
#SBATCH --gpus-per-node=2 # Specify the number of CPUs per node
./a.out
6.3. Interactive Job¶
The following are examples of interactive jobs.
Large memory node (1cpu)
[Login]$ srun -p mem1 -n 1 --mem 27000 --time=5 --pty bash -i
Large memory node (1node)
[Login]$ srun -p mem1 -n 224 --mem 6000G --time=5 --pty bash -i
GPU node (1cpu without GPUs)
[Login]$ srun -p gpu1 -n 1 --mem 2600 --time=5 --pty bash -i
GPU node (1node without GPUs)
[Login]$ srun -p gpu1 -n 72 --mem 185G --time=5 --pty bash -i
GPU node (1cpu with a GPU)
[Login]$ srun -p gpu1 -n 1 --mem 2700 --time=5 --gpus-per-node=1 --pty bash -i
[GPU]$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-32GB Off | 00000000:18:00.0 Off | 0 |
| N/A 41C P0 37W / 250W | 0MiB / 32768MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
GPU node (1node with 2GPUs)
[Login]$ srun -p gpu1 -n 72 --mem 186G --time=5 --gpus-per-node=2 --pty bash -i
[GPU]$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-32GB Off | 00000000:18:00.0 Off | 0 |
| N/A 39C P0 39W / 250W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE-32GB Off | 00000000:AF:00.0 Off | 0 |
| N/A 38C P0 38W / 250W | 0MiB / 32768MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
6.4. Scratch Directory (available in the large memory nodes)¶
Attention
A scratch directory (/worktmp/${SLURM_JOBID}) is available for each job.
The scratch directory is shared among other jobs/users, with a maximum of 1TB.
If the maximum capacity has already been used by other jobs/users, a write access will be failed.
After the job is finished, the directory (/worktmp/${SLURM_JOBID}) is automatically deleted.
The following is an example of using scratch area.
#!/bin/bash
#SBATCH -p mem1
#SBATCH -n 1
export TMPDIR=/worktmp/${SLURM_JOBID}
dd if=/dev/zero of=${TMPDIR}/foo.dat bs=100M count=1
6.5. Misc¶
To use the X Window System in the pre/post environment, use interactive mode by the srun
command with the --x11
option.
[Login]$ srun --x11 -p gpu1 -n 1 --time=5 --pty bash -i