5.10. Large page space allocation¶
Here explains about large page library that Supercomputer Fugaku HPC extention function offers.
5.10.1. Overview¶
The large page function extends HugeTLBfs, a standard function of Linux, and allocates memory (large pages) with a page size larger than that of normal pages (normal pages) to application programs that handle large data. This is a function that reduces the cost of OS address translation processing and improves memory access performance.
At Fugaku computing node,64KiB is available for normal pages and 2MiB for large pages. Large page is enabled by default, and you can select enable / disable of large page by environment variable (XOS_MMM_L_HPAGE_TYPE).
Merit and demerit about the defferenceare of page size are as following.
Evaluation item |
64KiB |
2MiB |
---|---|---|
TLB miss rate |
High |
Low |
Memory initialize cost |
Small |
Large |
Memory use rate |
High |
Low |
For paging method, there are 2 types : demand charging method and pre paging method.
The demand paging method is a method that allocates pages to main memory as needed when the required pages do not exist in main memory during application program execution. A physical page is allocated when the memory area is first accessed.
The pre paging method is a method in which pages are allocated in main memory in advance. Physical pages are allocated at the timing when the memory area is allocated.
Paging method |
In NUMA/Access to the outside memory |
Initial memory access |
---|---|---|
Demand paging method |
Use memory in NUMA as much as possible
(Cost : low)
|
Load pages into physical memory when accessing memory
(Cost : High)
|
Pre paging method |
Use regardless of memory inside and outside NUMA
(Cost : High)
|
Load page to phsical memory ih hand
(Cost : low)
|
Note
When spanning multiple CMGs in thread parallelization, the demand paging method is recommended. Please refer to the “Paging Policy of Large Page” section in the “Programming Guide (Programming common part)”.
The memory area and the target of large page conversion are shown below.
Area |
Target of large paging |
---|---|
Text (.text) area |
No (64KiB page) |
Static data (.data) area |
Yes (2Mi page) |
Static data(.bss) area |
Yes (2Mi page) |
Dynamic memory hold area (Heap area) |
No (64KiB page) |
Thread heap area |
Yes (2MiB page) |
Stuck area |
Yes (2MiB page) |
Thread stuck |
Yes (2MiB page) |
Dynamic memory saving area (Mmap area) |
Yes (2MiB page) |
Shared memory |
No (64KiB page) |
5.10.2. Environment variable for large page library setting¶
Variable name |
Specify value |
Default value |
---|---|---|
XOS_MMM_L_HPAGE_TYPE |
hugetlbfs|none |
hugetlbfs |
XOS_MMM_L_LPG_MODE |
base+stack|base |
base+stack |
XOS_MMM_L_PRINT_ENV |
on|off|1|0 |
0 |
The environment variable (XOS_MMM_L_PAGING_POLICY) allows you to set the paging method in each memory area of static data (.bss area), stack / thread stack area, and dynamic memory allocation area. The environment variables for setting the paging formula are shown below.
Variable name |
Specify value |
Default value |
---|---|---|
XOS_MMM_L_PAGING_POLICY |
[demand|prepage]:[demand|prepage]:[demand|prepage] |
demand:demand:prepage |
Note
We customize the default value of XOS_MMM_L_PAGING_POLICY. This is different from the default value shown in “Job Operation Software End-user’s Guide for HPC Extensions” “prepage:demand:prepage”.
The environment variables for tuning large page allocation are shown below.
Variable name |
Specify value |
Default value |
---|---|---|
XOS_MMM_L_ARENA_FREE |
1|2 |
1 |
XOS_MMM_L_ARENA_LOCK_TYPE |
0|1 |
1 |
XOS_MMM_L_MAX_ARENA_NUM |
Integer value that is over 1 and under INT_MAX [Decimal number] |
1 |
XOS_MMM_L_HEAP_SIZE_MB |
Integer value that is over MALLOC_MMAP_THRESHOLD_x2 and under ULONG_MAX <MiB unit>[Decimal number] |
MALLOC_MMAP_THRESHOLD_x2 |
XOS_MMM_L_COLORING |
0|1 |
1 |
XOS_MMM_L_FORCE_MMAP_THRESHOLD |
0|1 |
0 |
MALLOC_CHECK_ |
0|1|2|3|5|7 [Decimal number] |
3 |
MALLOC_TOP_PAD_ |
Integer value that is over 0 and under ULONG_MAX <byte unit> [Decimal number] |
131072(=128KiB) |
MALLOC_PERTURB_ |
Integer value that is over INT_MIN and under INT_MAX [Decimal number] |
0 |
MALLOC_MMAP_MAX_ |
Integer value that is over INT_MIN and under INT_MAX [Decimal number] |
2097152(=2*1024*1024) |
MALLOC_MMAP_THRESHOLD_ |
Integer value that is over 0 and under ULONG_MAX <byte unit> [Decimal number or Hexadecimal] |
134217728(=128MiB) |
MALLOC_TRIM_THRESHOLD_ |
Integer value that is over 0 and under ULONG_MAX <byte unit> [Decimal number or Hexadecimal] |
134217728(=128MiB) |
5.10.3. Large page library linking¶
To use large page from user program, it is required to link large page library to the executable file.
To use Fujitsu compiler, link the large page as a default (-Klargepage option is enabled as a default).
To use the compiler othar than Fujitsu’s, by linking large page library libmpg.so , it is possible to create executable file which is large paged.
Large page library path is as following.
Node type |
Path name |
---|---|
Login node |
/opt/FJSVxos/devkit/aarch64/rfs/opt/FJSVxos/mmm/lib64 |
Compute node |
/opt/FJSVxos/mmm/lib64 |
For large pages, this large page library also provides a linker script for large pages at the following path. This is for large pages of static data (.data) and static data (.bss). Specify this linker script to the compiler appropriately.
Node type |
Path name |
---|---|
Login node |
/opt/FJSVxos/devkit/aarch64/rfs/opt/FJSVxos/mmm/util/bss-2mb.lds |
Compute node |
/opt/FJSVxos/mmm/util/bss-2mb.lds |
The following is an example of the case to compile compute node gcc.
gcc -Wl,-T/opt/FJSVxos/mmm/util/bss-2mb.lds -L/opt/FJSVxos/mmm/lib64 -lmpg sample.c
See also
If the application program is compiled with PIE (Position Independent Executable), the .data / .bss area will not be converted to large pages. In this case, the large page library only outputs a warning log, normal pages are used in the .data / .bss area, and the application program continues to execute. For example, use the -no-pie option to compile gcc so that it does not explicitly use the PIE format.
5.10.4. Mitigating Memory Fragmentation¶
- XOS_MMM_L_HPAGE_TYPE=noneUses normal pages instead of large pages.Memory fragmentation occurs when “contiguous free memory space” is fragmented, resulting in small free spaces scattered around. Because of this, large pages, which require large contiguous space (page size of 2MiB or more), are more susceptible to its effects.By using normal pages, smaller contiguous areas (page size of 64KiB) can also be used, mitigating the impact of memory fragmentation.The disadvantage is that the TLB miss rate may increase, potentially degrading memory access performance.
- XOS_MMM_L_HUGETLB_FALLBACK=1If large pages are insufficient, it attempts to allocate the insufficient memory using normal pages.If the necessary memory cannot be secured even with normal pages, the process will terminate due to memory exhaustion.This serves as a remedy for memory shortage when using large pages.This environment variable is only effective when all of the following conditions are met. If the conditions are not met, it will operate as if the default value of 0 (disabled) is specified.
XOS_MMM_L_HPAGE_TYPE=hugetlbfs
XOS_MMM_L_PAGING_POLICY=*:*:prepage
XOS_MMM_L_ARENA_LOCK_TYPE=1
XOS_MMM_L_MAX_ARENA_NUM=1
The disadvantage is that the overhead of determining whether or not normal page allocation is necessary occurs, which degrades memory acquisition performance.