3.1.11.5. Notes¶

The following notes apply to all profilers. For notes on each profiler (Instant Performance, Advanced Performance, CPU Performance Analysis Report), refer to the manual “Profiler User’s Guide”-“5 Notes”.

3.1.11.5.1. Compile option¶

The following options may cause the profiler to behave unintentionally when compiling a program.

Compile options	Description
-O0	In case of C language or C ++ program, to measure the loop information of the program with Instant Performance Profiler, specify an optimisation option ( more than `-O1`) when translation.
-Nnoline, -ffj-no-line	If translation option `-Nnoline` or `-ffj-no-line` is enabled, cannot calculate cost by `-Puserfunc` option of `fipp` command. If it is something like this program, enable with `-Pnouserfunc` option then cost will be calculated right. Also traslation option `-Nnoline` or `-ffj-no-line` is enabled, cannot measure the MPI library cost.
-Ncoarray	If translation option `-Ncoarray` is enabled, be aware of the followings. The value obtained by adding +1 to the rank number or process number corresponds to the image number. The cost of the MPI library used by the COARRAY function may be recorded.
-Klto , -flto	If translation option `-Klto` or `-flto` is enabled, the following information output by the Instant Performance Profiler is not guaranteed. Call graph information The procedure name that is generated internally by link time optimization may be displayed in the displayed procedure name. Source code information The cost of each row may not be displayed correctly.

3.1.11.5.2. Avoiding long time or large memory use in large parallel programs¶

When analyzing large parallel programs (fipppx -A, fapppx -A, fipp -A and fapp -A), the profiler may take a long time to be analyzed or run out of memory.

In the analysis process, the following options of fapppx command can be used to avoid the above long run and low memory.

-p option: By reducing the number of processes to be analyzed, the memory area required for the analysis can be reduced.
-I option: By reducing the number of output items, the amount of memory use can be reduced.

3.1.11.5.3. CPU frequency of compute nodes that also serve as IO¶

The CPU frequency of compute nodes that also serve as IO (CN/BIO, CN/SIO and CN/GIO) is set to 2.2GHz. Users cannot change the CPU frequency of compute nodes that also serve as IO.

The CPU frequency of compute nodes (CN) is 2.0GHz. Be careful when evaluating the performance.

Refer to “7. Power control function” of “Users Guide - Use and job execution -” for details of the CPU frequency.

3.1.11.5.4. CPU Binding¶

When measuring profile data, you must control the bindings so that threads and CPU have a one-to-one relationship.

If the CPU is not binding, the Profiler measurements may be incorrect.

The CPU Performance Analysis Report prints the following message and does not generate a report.:

paXX.csv : The environment seems to not bind process to core.

For more information on CPU binding, see.

See the “Fortran User’s Guide”, “C User’s Guide”, or “C++ User’s Guide” when using threadparallel program.

Use the VCOORD file for CPU binding when using non-thread-parallel and MPI program (using mpiexec command). See { -vcoordfile | --vcoordfile } option in “MPI User’s Guide” to specify the VCOORD file. Set the number of CPUs (cores) to “1” (core=1) for all processes in the VCOORD file.

Use taskset or numactl command for CPU binding when using neither thread-parallel nor MPI program. For more information, see the man page.

3.1.11.5.5. CPU Performance Analysis Report Cache Miss Ratio¶

The ratio of cache misses may be out of range due to exceed the effect of measurement errors or measurement variation. The ratio of cache misses greater than 100% is considered 100%, and the ratio of cache misses less than 0% is considered 0%.

Refer to “4.2.2.5.2 Cache (Standard and Detail Reports)” in “Profiler User’s Guide”.

3.1.11.5. Notes¶

3.1.11.5.1. Compile option¶

3.1.11.5.2. Avoiding long time or large memory use in large parallel programs¶

3.1.11.5.3. CPU frequency of compute nodes that also serve as IO¶

3.1.11.5.4. CPU Binding¶

3.1.11.5.5. CPU Performance Analysis Report Cache Miss Ratio¶

Table of Contents

Previous topic

Next topic