3.1.7.11.2. Automatic parallel¶
The following is an example of compiling a Fortran program of a single node job (automatic parallel).
- Prepare source program.Sample program is the same with an emample of “Fortran sample program (sequential)”.
1 parameter(n=11776)
2 real*8 a(n+1,n),b(n+1,n),c(n+1,n)
3 real*8 t0,t1,second
4!
5! print message
6!
7 write(6,500)
8 write(6,510) n,n
9 500 format(" *** matrix multiply *** ")
10 510 format(' (NPROW,NPCOL) : (',I6,',',I6,')')
11
12!
13! initialize matrix
14!
15 do j=1,n
16 do i=1,n
17 a(i,j)=0.1d0
18 b(j,i)=0.3d0
19 c(i,j)=0.0d0
20 end do
21 end do
22!
23! do matrix multiply
24!
25 t0 = second()
26 do j=1,n
27 do i=1,n
28 do k=1,n
29 c(i,j) = c(i,j) + a(i,k) * b(k,j)
30 end do
31 end do
32 end do
33 t1 = second()
34!
35! print result
36!
37 write(6,520) c(1,1)
38 write(6,530) t1-t0
39 520 format(" c(1,1)=",g10.4)
40 530 format(" time :",f10.2," sec")
41 end
42!----
43 function second()
44 real*8 second,t
45 call gettod(t)
46 second=t*1.e-6
47 end
Compile sample program.
[_LNlogin]$ frtpx -V -Kfast,parallel,optmsg=2 -Nlst=t -o sample_parallel sample.f frtpx: Fujitsu Fortran Compiler 4.1.0 tcsds-1.2.24 jwd_fortpx: Fujitsu Fortran Compiler 4.1.0 (Feb 26 2020 07:41:18) Fortran diagnostic messages: program name(main) jwd5001p-i "sample.f", line 15: DO loop with DO variable 'j' is parallelized. jwd8320o-i "sample.f", line 15: Loop blocking is performed with size 48. jwd8320o-i "sample.f", line 16: Loop blocking is performed with size 48. jwd6001s-i "sample.f", line 16: SIMD conversion is applied to DO loop with DO variable 'i'. jwd8663o-i "sample.f", line 16: This loop is not software pipelined because the software pipelining does not improve the performance. jwd8202o-i "sample.f", line 16: Loop unrolled 2 times. jwd8220o-i "sample.f", line 26: Optimizations is performed in this program unit with possibility of side effects. See informational messages below to determine which such optimizations have been performed. jwd8331o-i "sample.f", line 26: This DO loop was changed to the library call(matmul). GNU assembler version 2.30 (aarch64-linux-gnu) using BFD version version 2.30-49.el7 GNU ld version 2.30-49.el7 Supported emulations: aarch64linux aarch64elf aarch64elf32 aarch64elf32b aarch64elfb armelf armelfb aarch64linuxb aarch64linux32 aarch64linux32b armelfb_linux_eabi armelf_linux_eabi i386pep i386pe flistpx: Fujitsu Listing Processor 4.1.0 (Jan 9 2020 14:46:36)
Note
As specifying
-Nlst=t
, automatic parallelization information, optimization information, and statistical information are output. The file name is output with the extension of the source program as lst.As specifying
-Koptmsg=2
, messages are output about optimization functions such as automatic parallelization, SIMD conversion, and loop unrolling. In the above example, jwd5001p-i, jwd6001s-i, etc.jwd5001p-i means that the DO loop is parallelized. The auto-parallelized DO loop and DO variable are displayed.
jwd6001s-i means that the DO loop has been converted to SIMD. The SIMDized DO loop and DO variable are displayed.
- Prepare job script.Job script sample is prepared as
/home/system/sample/Fortran/parallel/job_para.sh
.
#!/bin/sh #PJM -L "node=1" #PJM -L "rscgrp=small" #PJM -L "elapse=10:00" #PJM -x PJM_LLIO_GFSCACHE=/vol000N #PJM -g groupname #PJM -s # execute job export OMP_NUM_THREADS=12 ./sample_parallel
Submit a job with pjsub command.
[_LNlogin]$ pjsub job_para.sh [INFO] PJM 0000 pjsub Job 112 submitted.
- Check execution result.The standard output is output as
Job name.Job ID.out
.
[_LNlogin]$ cat job_para.sh.112.out *** matrix multiply *** (NPROW,NPCOL) : ( 11776, 11776) c(1,1)= 353.3 time : 27.96 sec