3.1.7.11.2. Automatic parallel¶

The following is an example of compiling a Fortran program of a single node job (automatic parallel).

Prepare source program.

Sample program is the same with an emample of “Fortran sample program (sequential)”.

(Download sample code)

      parameter(n=11776)
      real*8 a(n+1,n),b(n+1,n),c(n+1,n)
      real*8 t0,t1,second
!
!  print message
!
      write(6,500)
      write(6,510) n,n
 500  format(" *** matrix multiply *** ")
 510  format(' (NPROW,NPCOL) : (',I6,',',I6,')')

!
!  initialize matrix
!
      do j=1,n
        do i=1,n
          a(i,j)=0.1d0
          b(j,i)=0.3d0
          c(i,j)=0.0d0
        end do
      end do
!
!  do matrix multiply
!
      t0 = second()
      do j=1,n
        do i=1,n
          do k=1,n
            c(i,j) = c(i,j) + a(i,k) * b(k,j)
          end do
        end do
      end do
      t1 = second()
!
!  print result
!
      write(6,520) c(1,1)
      write(6,530) t1-t0
  520 format(" c(1,1)=",g10.4)
  530 format(" time :",f10.2," sec")
      end
!----
      function second()
      real*8 second,t
      call gettod(t)
      second=t*1.e-6
      end

Compile sample program.

[_LNlogin]$ frtpx -V -Kfast,parallel,optmsg=2 -Nlst=t -o sample_parallel sample.f
frtpx: Fujitsu Fortran Compiler 4.1.0 tcsds-1.2.24
jwd_fortpx: Fujitsu Fortran Compiler 4.1.0 (Feb 26 2020 07:41:18)
Fortran diagnostic messages: program name(main)
  jwd5001p-i  "sample.f", line 15: DO loop with DO variable 'j' is parallelized.
  jwd8320o-i  "sample.f", line 15: Loop blocking is performed with size 48.
  jwd8320o-i  "sample.f", line 16: Loop blocking is performed with size 48.
  jwd6001s-i  "sample.f", line 16: SIMD conversion is applied to DO loop with DO variable 'i'.
  jwd8663o-i  "sample.f", line 16: This loop is not software pipelined because the software pipelining does not improve the performance.
  jwd8202o-i  "sample.f", line 16: Loop unrolled 2 times.
  jwd8220o-i  "sample.f", line 26: Optimizations is performed in this program unit with possibility of side effects. See informational messages below to determine which such optimizations have been performed.
  jwd8331o-i  "sample.f", line 26: This DO loop was changed to the library call(matmul).
GNU assembler version 2.30 (aarch64-linux-gnu) using BFD version version 2.30-49.el7
GNU ld version 2.30-49.el7
  Supported emulations:
   aarch64linux
   aarch64elf
   aarch64elf32
   aarch64elf32b
   aarch64elfb
   armelf
   armelfb
   aarch64linuxb
   aarch64linux32
   aarch64linux32b
   armelfb_linux_eabi
   armelf_linux_eabi
   i386pep
   i386pe
flistpx: Fujitsu Listing Processor 4.1.0 (Jan  9 2020 14:46:36)

Note

As specifying -Nlst=t, automatic parallelization information, optimization information, and statistical information are output. The file name is output with the extension of the source program as lst.
As specifying -Koptmsg=2, messages are output about optimization functions such as automatic parallelization, SIMD conversion, and loop unrolling. In the above example, jwd5001p-i, jwd6001s-i, etc.
jwd5001p-i means that the DO loop is parallelized. The auto-parallelized DO loop and DO variable are displayed.
jwd6001s-i means that the DO loop has been converted to SIMD. The SIMDized DO loop and DO variable are displayed.

Prepare job script.

Job script sample is prepared as /home/system/sample/Fortran/parallel/job_para.sh.

(Download sample code)

#!/bin/sh
#PJM -L "node=1"
#PJM -L "rscgrp=small"
#PJM -L "elapse=10:00"
#PJM -x PJM_LLIO_GFSCACHE=/vol000N
#PJM -g groupname
#PJM -s

# execute job
export OMP_NUM_THREADS=12
./sample_parallel

Submit a job with pjsub command.

[_LNlogin]$ pjsub job_para.sh
[INFO] PJM 0000 pjsub Job 112 submitted.

Check execution result.

The standard output is output as Job name.Job ID.out.

[_LNlogin]$ cat job_para.sh.112.out
*** matrix multiply ***
 (NPROW,NPCOL) : ( 11776, 11776)
 c(1,1)= 353.3
 time :     27.96 sec

3.1.7.11.2. Automatic parallel¶

Previous topic

Next topic