3.1.11.3. Instant Performance Profiler¶

The Instant Performance Profiler measures and outputs statistical information for the entire program through sampling analysis.

3.1.11.3.1. Overview¶

Instant Performance Profiler is composed of 2 commands: fipp command that measures profile data and fipppx command that outputs profile result from measured data. The statistical information that can be output by the Instant Performance Profiler is as follows.

Statistical time information
CPU performance characteristics
Cost information
Call graph information
Source code information

The flow of using the Advanced Performance Profiler is as follows.

3.1.11.3.2. Addition of measurement interval specification routine¶

To measure profile data in a specified section, add a measurement section specification routine / function to the source code.
The measurement interval specification function can be used as a Fortran language subroutine or C/C++ function.
When using C/C++ functions, you must declare the function prototype or include the header file of the profiler subroutine.

Language type

Header file

Sub routine / function name

Argument

Function

Fortran

None

fipp_start

fipp_stop

None

Start measuring Cost information

End measuring Cost information

C/C++

fj_tool/fipp.h

void fipp_start

void fipp_stop

None

Start measuring Cost information

End measuring Cost information

Attention

To collect measurement data using these subroutines / functions, specify -Sregion option to fipp command.

When calling these subroutines / functions multiple times, be sure to call them in the order fipp_start and flipp_stop. If fipp_start is called again before calling fipp_stop, or if fipp_stop is called before calling fipp_start, a warning message is printed and the call is ignored. Also, if the process ends without calling fipp_stop, profile data for that section is not measured.

When these subroutines / functions are called multiple times, the results of all specified measurement intervals are added together.

For MPI programs, call these subroutines / functions in all processes that you want to measure. It does not measure profile data for processes that have not been called.

An example of using the measurement interval specification routine is shown below.

Fortran example

Sample specification example

program main
...
do i=1,10000
  ...
  call fipp_start   ! Start measurring
  do j=1,10000
  ...
  end do
  call fipp_stop    ! End measurring
end do
end program main

Example of measuring all processes (measurement starts before calling the mpi_init subroutine)

call fipp_start     ! Start measurring
call mpi_init(err)
...
call mpi_finalize(err)
call fipp_stop      ! End measurring

Example of measuring all processes (measurement starts immediately after calling the mpi_init subroutine)

call mpi_init(err)
call fipp_start     ! Start measurring
...
call fipp_stop      ! End measurring
call mpi_finalize(err)

Example of measuring only process 0

call mpi_init(err)
call mpi_comm_rank(mpi_comm_world,rank,err)
if(rank==0) then
  call fipp_start   ! Only process 0, start measurring
end if
  ...
if(rank==0) then
  call fipp_stop    ! Only process 0, end measurring
end if
call mpi_finalize(err)

C/C++ example

Sample specification example

#include "fj_tool/fipp.h" // Include header file
...
int main(void)
{
  int i,j;
  for(i=0;i<10000;i++){
    ...
    fipp_start();   // Start measurring
    for(j=0;j<10000;j++){
       ...
    }
    fipp_stop();    // End measurring
  }
  return 0;
}

Example of measuring all processes (start measurement before calling the MPI_Init function)

fipp_start();       // Start measurring
MPI_Init(&argc, &argv);
...
MPI_Finalize();
fipp_stop();        // End measurring

Example of measuring all processes (measurement starts immediately after calling the MPI_Init function)

MPI_Init(&argc, &argv);
fipp_start();       // Start measurring
...
fipp_stop();        // End measurring
MPI_Finalize();

Example of measuring only process 0

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(rank==0){
  fipp_start();     // Only process 0, start measurring
}
...
if(rank==0){
  fipp_stop();      // Only process 0, end measurring
}
MPI_Finalize();

3.1.11.3.3. Compiling / Linking¶

The tool library required to use the Instant Performance Profiler function is linked by default when compiling / linking.
Therefore, there is no need to specify special options as in the following example.
Compile / link is done at the login node or the compute node.

Compile / link example (MPI program)

[_LNlogin]$ mpifrtpx  -Kfast,parallel  "Source file name"

Compile / link example (sequential / thread parallel program)

[_LNlogin]$ frtpx  -Kfast,parallel  "Source file name"
Attention

When split compilation is performed, the optimization options specified at compile time should also be specified at link time so that the library of the appropriate profiler is linked. For example, in the case of a program that uses OpenMP, specify -Kopenmp option when linking.

3.1.11.3.3.1. About Fortran translation options¶

The following are the translation options for the profiler used when translating Fortran programs.

This option is enabled when linking.

Option

Description

-Nfjprof

Combine tool library. When omitted, -Nfjprof is enabled.

-Nnofjprof

Do not combine tool library. Cannot use profiler.

3.1.11.3.3.2. About C/C++ translation options¶

For the C/C++ language, there are two types of compilation modes: trad mode and clang mode.
The following are the translation-time options for the profiler used to translate the program for each mode.
This option is enabled when linking.

Mode

Option

Description

trad

-Nfjprof

Combine tool library. When omitted, -Nfjprof is enabled.

trad

-Nnofjprof

Do not combine tool library. Cannot use profiler.

clang

-ffj-fjprof

Combine tool library. When omitted, -ffj-fjprof is enabled.

clang

-ffj-no-fjprof

Do not combine tool library. Cannot use profiler.

3.1.11.3.4. Measurement of profiler data¶

Profiler data is measured by using fipp command.

This operation is performed on the compute node.

This indicates execution example of fipp command.

[Condition]

fipp command is added to the point where the execution module (a.out) is specified.

Gather the data to the directory (profiling_data) where specified with -d option.

[MPI program]
#!/bin/sh
#
#PJM -L "node=2x2x2"
#PJM -L "elapse=01:00:00"
#PJM -x PJM_LLIO_GFSCACHE=/vol000N
#PJM -g groupname
#PJM -s
#
fipp -C -d profiling_data -Icall,mpi mpiexec ./a.out
[Sequential / Thread parallel program]
#!/bin/sh
#
#PJM -L "node=1"
#PJM -L "elapse=01:00:00"
#PJM -x PJM_LLIO_GFSCACHE=/vol000N
#PJM -g groupname
#PJM -s
#

fipp -C -d profiling_data -Icall,cpupa ./a.out
Attention

If proceed the following to the profiler data that is measured by fipp command, we do not guarantee the work result.

Edit profile data

Add, delete, and rename profile data

3.1.11.3.5. Profiler option¶

We describe fipp command option.

Option

Description

-C

(Required option)

Specifies measurement of profile data. If this option is omitted, an error message is output and the execution of the program ends.

-d profile_data

(Required option)

Specify the directory for storing profile data. If this option is omitted, an error message is output and the execution of the program ends.

In profile_data, specify the directory name for storing the profile data as a relative or absolute path. The specified directory must be new or empty.

When analyzing a program that moves the current directory during execution, profile_data is specified as an absolute path.

The Profiler creates a subdirectory for every 1000 files under profile_data. Therefore, even for large jobs, you only need to specify profile_data.

exec-file [ exec_option … ]

Specify the executable file and options for profile data measurement. If MPI program, specify from mpiexec.

-H[mode={all|user}]

Specify the measurement details of the CPU operation status. Specify one of those: all or user to sub option mode=. When omitted this option or sub option mode={all|user}, mode=all is enabled.

mode=all
Performs measurement in kernel mode and user mode.

mode=user
Performs information measurement in user mode.

-Iitem

(Hyphen + capital letter I)

Indicates the Instant Performance Profiler items to collect. If specify the multiple item, devide with comma.

item :{{call | nocall} | {cpupa | nocpupa} | {mpi | nompi}}

call:
Gather Call graph information.

nocall:
Do not gather Call graph information. When omitted, nocall is enabled.

cpupa:
Measures the CPU performance characteristics. When omitted, cpupa is enabled.

nocpupa:
Do not measure the CPU performance characteristics.

mpi:
Measures the MPI Cost information. When omitted when the target is an MPI program mpi is enabled.

nompi:
Do not measures the MPI Cost information.When omitted when the target is an MPI program nompi is enabled.

-i interval

Specify the sampling interval for measuring profile data. interval specifies the sampling interval as an integer (in milliseconds). When this option is omitted, -i 100 option will be enabled. Specify an integer value in the range of 10 to 3,600,000 to interval.

-L{shared | noshared}

Specify how to measure the generated shared library that the translation option -Nline or -ffj-lineis specified.

shared
The following information in the shared library with line information is measured.

Starting line number of the procedure

End line number of the procedure

Loop cost distribution information

Line cost distribution information

noshared
The above information in the shared library with line information is not measured. When omitted, noshared is enabled.

-l limit

Specify the number of procedure information measurements.

For the procedure information of more than the output number, add up and measure as __other__.

When omitted this option, -l 0 option is enabled. Specify an integer value in the range of 0 to 2,147,483,647 to limit. If specified 0 to limit, measure the entire.

-m memsize

Specify the work memory size to be used for measurement as an integer value (KB).

A working memory area is allocated for each thread.

When omitted this option, -m 3000 option is enabled. Specify an integer value in the range 1 to 2,147,483 to memsize.

-P{userfunc | nouserfunc}

This option specifies how to appropriate the procedure cost. It applies to a mix of an object for which the compiler option -Nline or -ffj-line is specified (object with line information) and an object for which the compiler option -Nnoline or -ffj-no-line is specified (object without line information). The standard library and a shared library when -Lnoshared is specified are handled as objects without line information.

userfunc
If a cost is appropriated to a procedure of the object without line information, the procedure that called the procedure of the object without line information is traced back from call graph information. If a procedure of the object with line information exists, the cost is appropriated to the procedure. If no procedures of the object with line information exist, the cost is not appropriated. When specifying the -Puserfunc option, you must specify the -Icall option at the same time. If you do not specify the -Icall option, an error message is output and the collecting command is terminated.

nouserfunc
If a cost is appropriated to a procedure of the object without line information, the cost is appropriated to the procedure. However, the procedure start and end lines are not output.

-S{all | region}

Specify the measurement interval for profile data.

all
Measure the entire program. When omitted, all is enabled.

region
Measures the section specified by the measurement section specification routine. A measurement interval specification routine must be inserted in the source code.

-W{spawn|nospawn}

Specify the measurement method for dynamically generated processes. When omitted and specified, if it is MPI program, -Wspawnis enabled and if it is not MPI program, -Wnospawn is enabled.

spawn
Measure dynamically generated process statistics

nospawn
Does not measure dynamically generated process statistics

3.1.11.3.6. Output profile result¶

By using fipppx command, outputs the measured profile data results with fipp command.

Perform this operation on the login node.

An execution example is shown below of fipppx command.

If it’s login node
Use fipppx command.
If it’s computing node
Use fipp command.

[Command execution example]

[_LNlogin]$ fipppx -A -pall -Ibalance,call -d profiling_data

In this example, output of all process information is specified ( -p all). As a result of high parallel execution, if all processes are targeted for output, the output may be enormous. If you know in advance what process to focus on, you can also output by specifying the process number like -p0,1 (Process 0 and 1) .

3.1.11.3.6.1. fipppx command option¶

Option	Function / measurement value (unit)
-A (Required option)	Specify output processing of profile results.
-d profile_data (Required option)	Specify the directory where the profile data is stored in profile_data as a relative or absolute path.
-f func_name	Specify the name of the procedure used by the program in func_name, output information about func_name. However if does not measure information about the process of func_name by fipp command, or func_name process cost is 0, information will not be output even specified `ffunc_name`.
-Iitem (Hyphen + capital letter I)	Specify the items to be output as profile results. If specify the multiple item, devide with comma. item:{{`balance` \| `nobalance}\|`} \| {`call` \| `nocall}\|`} \| {`cpupa` \| `nocpupa}\|`} \| {`mpi` \| `nompi`} \| {`src[:path ]` \| `nosrc}}`}} balance: Output cost balance information to Cost information. nobalance: Do not output cost balance information to Cost information. When omitted, `-nobalance` is enabled. call: Output Call graph information. nocall: Do not output Call graph information.When omitted, `-nocall` is enabled. cpupa: Output the CPU performance characteristics. When omitted `-cpupa` is enabled. nocpupa: Do not output the CPU performance characteristics. mpi: Output MPI Cost information. When omitted when the target is an MPI program `-mpi` is enabled. nompi: Do not output MPI Cost information. If omitted if the target is a non-MPI program `-nompi` is enabled. src[:path ]: Outputs Source code information and cost per line. For per line cost, it does not include the cost output with `-Impi` option. Specify the directory path where the source code exists to `path`. If `path` is specified multiply, devide with colon (:) and specify. nosrc: Do not output Source code information. When omitted, `nosrc` is enabled.
-l limit (Lowercase l)	Specify the number of procedure information items to be output. When this option is omitted, `-l 10` option is enabled. Specify an integer value in the range of 0 to 2,147,483,647. to limit. If specified 0 to limit, the entire will be output.
-o outfile	Specify the output destination of the profile result. For outfile, specify the output file name as a relative or absolute path, or specify “stdout”. When this option is omitted, `-ostdout` option is enabled.
-pp_no	Specify the process to be output to the profile result. To p_no , specify one of these : `N`, `input=n`, `limit=m`, `all`. When this option is omitted, `-pinput=0`, `limit=16` option is enabled. To `-p` option, as comma (,) as devision, p_no can specified multiply. For example, it can be like this : `-p3,5,limit=10`. N … : The information of the process number specified in N is output at the beginning. If the information of the process number specified in N does not exist, ignore the specification. If multiple Ns are specified, they are output in the specified order. input=n: Reads the number of n process information in descending order of cost. If 0 is specified for n or a value exceeding the number of processes is specified, information for all processes is read. When this sub option is omitted, `input=0` is enabled. Sub option `input=n` and sub option `limit=m` can be specified at the same time. limit=m: Outputs the number of m process information in descending order of cost. The information of the process that was not output is included in the denominator when calculating the ratio. If m is specified as 0 or a value exceeding the number of input processes, information for all processes is output. When this sub option is omitted, `limit=16` is enabled. all: Read all process number information and output in order of highest cost. It is the same of when specified `-pinput=0,limit=0` option. Neither sub option `input=n` and `limit=m` is not specified, this is enabled.
-Tt_no	Specify the thread to output profile data. To t_no , specifiy one of these: `N`, `limit=n`, `all`. To `-T` option, as comma (,) as devision, `t_no` can specified multiply. For example, it canbe : `-T3,5,limit=10`. N[,N] … : The information of the thread number specified in N is output at the head. If the information of the thread specified in N does not exist, ignore the specification. If multiple Ns are specified, they are output in the specified order. limit=n: The number of N thread information is output in descending order of cost. If n is specified as 0 or a value exceeding the total number of threads, information on all threads is output. all: Outputs information about all threads. It is the same as when specifying `-Tlimit=0` option. When omitted, `-Tall` option is enabled.
-t{text\|xml}	Specify the output format of the profile result. text: Outputs profile results in TEXT format. When omitted, `-ttext` is enabled. xml: Outputs profile results in XML format.

3.1.11.3.6.2. Profile result¶

The profile result consists of the following statistical information.
Each information can be limited to output with  -I option of  fipppx command.
For details of each item, please refer to “Profiler User’s Guide”-“2.2.2 Detail of Profile Result”.

Profile data measurement environment information
Statistical time information
CPU performance characteristics
Cost information (procedure cost distribution information, loop cost distribution information, line cost distribution information)
Call graph information
Source code information

3.1.11.3. Instant Performance Profiler¶

3.1.11.3.1. Overview¶

3.1.11.3.2. Addition of measurement interval specification routine¶

3.1.11.3.3. Compiling / Linking¶

3.1.11.3.3.1. About Fortran translation options¶

3.1.11.3.3.2. About C/C++ translation options¶

3.1.11.3.4. Measurement of profiler data¶

3.1.11.3.5. Profiler option¶

3.1.11.3.6. Output profile result¶

3.1.11.3.6.1. fipppx command option¶

3.1.11.3.6.2. Profile result¶

Table of Contents

Previous topic

Next topic

Language type	Header file	Sub routine / function name	Argument	Function
Fortran	None	fipp_start fipp_stop	None	Start measuring Cost information End measuring Cost information
C/C++	fj_tool/fipp.h	void fipp_start void fipp_stop	None	Start measuring Cost information End measuring Cost information

Option	Description
-Nfjprof	Combine tool library. When omitted, `-Nfjprof` is enabled.
-Nnofjprof	Do not combine tool library. Cannot use profiler.

Mode	Option	Description
trad	-Nfjprof	Combine tool library. When omitted, `-Nfjprof` is enabled.
trad	-Nnofjprof	Do not combine tool library. Cannot use profiler.
clang	-ffj-fjprof	Combine tool library. When omitted, `-ffj-fjprof` is enabled.
clang	-ffj-no-fjprof	Do not combine tool library. Cannot use profiler.

Option	Description
-C (Required option)	Specifies measurement of profile data. If this option is omitted, an error message is output and the execution of the program ends.
-d profile_data (Required option)	Specify the directory for storing profile data. If this option is omitted, an error message is output and the execution of the program ends. In `profile_data`, specify the directory name for storing the profile data as a relative or absolute path. The specified directory must be new or empty. When analyzing a program that moves the current directory during execution, `profile_data` is specified as an absolute path. The Profiler creates a subdirectory for every 1000 files under `profile_data`. Therefore, even for large jobs, you only need to specify `profile_data`.
exec-file [ exec_option … ]	Specify the executable file and options for profile data measurement. If MPI program, specify from mpiexec.
-H[mode={all\|user}]	Specify the measurement details of the CPU operation status. Specify one of those: `all` or `user` to sub option `mode=`. When omitted this option or sub option `mode={all\|user}`, `mode=all` is enabled. mode=all Performs measurement in kernel mode and user mode. mode=user Performs information measurement in user mode.
-Iitem (Hyphen + capital letter I)	Indicates the Instant Performance Profiler items to collect. If specify the multiple item, devide with comma. item :{{`call` \| `nocall`} \| {`cpupa` \| `nocpupa`} \| {`mpi` \| `nompi`}} call: Gather Call graph information. nocall: Do not gather Call graph information. When omitted, `nocall` is enabled. cpupa: Measures the CPU performance characteristics. When omitted, `cpupa` is enabled. nocpupa: Do not measure the CPU performance characteristics. mpi: Measures the MPI Cost information. When omitted when the target is an MPI program `mpi` is enabled. nompi: Do not measures the MPI Cost information.When omitted when the target is an MPI program `nompi` is enabled.
-i interval	Specify the sampling interval for measuring profile data. interval specifies the sampling interval as an integer (in milliseconds). When this option is omitted, `-i 100` option will be enabled. Specify an integer value in the range of 10 to 3,600,000 to `interval`.
-L{shared \| noshared}	Specify how to measure the generated shared library that the translation option `-Nline` or `-ffj-line`is specified. shared The following information in the shared library with line information is measured. Starting line number of the procedure End line number of the procedure Loop cost distribution information Line cost distribution information noshared The above information in the shared library with line information is not measured. When omitted, `noshared` is enabled.
-l limit	Specify the number of procedure information measurements. For the procedure information of more than the output number, add up and measure as `__other__`. When omitted this option, `-l 0` option is enabled. Specify an integer value in the range of 0 to 2,147,483,647 to `limit`. If specified 0 to `limit`, measure the entire.
-m memsize	Specify the work memory size to be used for measurement as an integer value (KB). A working memory area is allocated for each thread. When omitted this option, `-m 3000` option is enabled. Specify an integer value in the range 1 to 2,147,483 to `memsize`.
-P{userfunc \| nouserfunc}	This option specifies how to appropriate the procedure cost. It applies to a mix of an object for which the compiler option `-Nline` or `-ffj-line` is specified (object with line information) and an object for which the compiler option `-Nnoline` or `-ffj-no-line` is specified (object without line information). The standard library and a shared library when `-Lnoshared` is specified are handled as objects without line information. userfunc If a cost is appropriated to a procedure of the object without line information, the procedure that called the procedure of the object without line information is traced back from call graph information. If a procedure of the object with line information exists, the cost is appropriated to the procedure. If no procedures of the object with line information exist, the cost is not appropriated. When specifying the `-Puserfunc` option, you must specify the `-Icall` option at the same time. If you do not specify the `-Icall` option, an error message is output and the collecting command is terminated. nouserfunc If a cost is appropriated to a procedure of the object without line information, the cost is appropriated to the procedure. However, the procedure start and end lines are not output.
-S{all \| region}	Specify the measurement interval for profile data. all Measure the entire program. When omitted, `all` is enabled. region Measures the section specified by the measurement section specification routine. A measurement interval specification routine must be inserted in the source code.
-W{spawn\|nospawn}	Specify the measurement method for dynamically generated processes. When omitted and specified, if it is MPI program, `-Wspawn`is enabled and if it is not MPI program, `-Wnospawn` is enabled. spawn Measure dynamically generated process statistics nospawn Does not measure dynamically generated process statistics