3.1.11.5. Notes¶
The following notes apply to all profilers. For notes on each profiler (Instant Performance, Advanced Performance, CPU Performance Analysis Report), refer to the manual “Profiler User’s Guide”-“5 Notes”.
3.1.11.5.1. Compile option¶
The following options may cause the profiler to behave unintentionally when compiling a program.
Compile options |
Description |
---|---|
-O0 |
In case of C language or C ++ program, to measure the loop information of the program with Instant Performance Profiler, specify an optimisation option ( more than |
-Nnoline, -ffj-no-line |
If translation option If it is something like this program, enable with Also traslation option |
-Ncoarray |
If translation option
|
-Klto , -flto |
If translation option
|
3.1.11.5.2. Avoiding long time or large memory use in large parallel programs¶
When analyzing large parallel programs (fipppx -A
, fapppx -A
, fipp -A
and fapp -A
), the profiler may take a long time to be analyzed or run out of memory.
In the analysis process, the following options of fapppx command can be used to avoid the above long run and low memory.
- -p option
By reducing the number of processes to be analyzed, the memory area required for the analysis can be reduced.
- -I option
By reducing the number of output items, the amount of memory use can be reduced.
3.1.11.5.3. CPU frequency of compute nodes that also serve as IO¶
The CPU frequency of compute nodes that also serve as IO (CN/BIO, CN/SIO and CN/GIO) is set to 2.2GHz. Users cannot change the CPU frequency of compute nodes that also serve as IO.
The CPU frequency of compute nodes (CN) is 2.0GHz. Be careful when evaluating the performance.
Refer to “7. Power control function” of “Users Guide - Use and job execution -” for details of the CPU frequency.
3.1.11.5.4. CPU Binding¶
The CPU Performance Analysis Report prints the following message and does not generate a report.:
paXX.csv : The environment seems to not bind process to core.
For more information on CPU binding, see.
See the “Fortran User’s Guide”, “C User’s Guide”, or “C++ User’s Guide” when using threadparallel program.
Use the VCOORD file for CPU binding when using non-thread-parallel and MPI program (using mpiexec command).
See { -vcoordfile | --vcoordfile }
option in “MPI User’s Guide” to specify the VCOORD file.
Set the number of CPUs (cores) to “1” (core=1) for all processes in the VCOORD file.
Use taskset or numactl command for CPU binding when using neither thread-parallel nor MPI program. For more information, see the man page.
3.1.11.5.5. CPU Performance Analysis Report Cache Miss Ratio¶
The ratio of cache misses may be out of range due to exceed the effect of measurement errors or measurement variation. The ratio of cache misses greater than 100% is considered 100%, and the ratio of cache misses less than 0% is considered 0%.
Refer to “4.2.2.5.2 Cache (Standard and Detail Reports)” in “Profiler User’s Guide”.