1. Introduction¶
1.1. The purpose of this document.¶
This document describes about the basic use direction of Supercomputer Fugaku.
1.2. Notation used in this document¶
In command execution, the user terminal and login node to be operated are represented by a prompt.
Prompt
Control target
[terminal]
Means to execute the command at the user device
[_LNlogin]
Means to execute the command at the login node (Common)
[_LNIlogin]
Means to execute the command at the login node (Intel)
[_LNAlogin]
Means to execute the command at the login node (Arm)
[_CNlogin]
Means to execute the command at the computing node
Home directory indicates with ~ (tilde).
The language environment is described based on the latest version of functions unless otherwise specified.
1.3. Abbreviations and aliases¶
The used abbreviations in this document is as following.
Name
Abbreviations and aliases
Next-generation ultra-high-speed computer system
Supercomputer Fugaku
Computing node
CN
BIO and computing node
CN/BIO
SIO and computing node
CN/SIO
GIO and computing node
CN/GIO
Login node and file transfer node
Login node or LN
Storage connected to BIO and computing node
System disk
Storage connected to SIO and computing node
First-layer storage
First tier SSD or SSD
1.4. About trademarks¶
Company names and product names in the text may be trademarks or registered trademarks of the respective companies. Other trademarks and registered trademarks are generally trademarks or registered trademarks of their respective companies. Please note that trademark names (TM, (R)) are not always added to system names, product names, etc., described in this document.
1.5. Change log¶
This indicates the update history of this document.
Version 1.49 May 14, 2025
Added “5.20.2. Job Execution in Startup Project (trial)”.
Version 1.48 April 4, 2025
Updated description of Maximum amount of job memory in “3.4.12. Estimating the amount of memory available to user programs”.
Updated example display of pjacl command in “5.1.3. Create job script”.
Added “5.10.4. Mitigating Memory Fragmentation”.
Added “Attention” regarding the time lag until the results of the job_events command are reflected in “5.19.1. job_events”.
Updated “Setting value (Default)” in “7.1. Overview”.
Updated “Setting value” in “Attention” of “7.1. Overview”.
Updated “Default value” in “7.2.1. Function overview”.
Updated “Use example” in “7.2.2. Use example”.
Fixed link of “Programming Guide(IO part)” in “8.1. Overview”.
Updated sample code for “8.2.1. When writing from one process to one file”.
Updated sample code for “8.2.2. When multiple processes write to a file”.
Version 1.47 February 25, 2025
Added a sample Certificate Manager screen to “4.2.3. Installing the certificate to Chrome (Windows)”.
Added a note to “4.4.3.1. Login node”.
Updated “Attention” in “7.1. Overview”.
Version 1.46 December 27, 2024
Updated the description of Darshan in “8.7. I/O profiling” and “8.8. I/O optimization”.
Version 1.45 November 27, 2024
Updated “Use of tmpfs area (/worktmp/)” in “3.4.5. Disk”.
Updated “3.4.5.2. Method of using tmpfs area (/worktmp/)”.
Updated “5.9.2.2. Low priority jobs”
Removed the description of show_affected_jobs from “5.19. Job list display command affected by the failure”.
Updated “5.19.1. job_events”.
Modify the sample code for “8.2.1. When writing from one process to one file”.
Modify the sample code for “8.2.2. When multiple processes write to a file”.
Version 1.44 November 1, 2024
Added “Attention” in “Use of tmpfs area” of “3.4.5. Disk”.
Updated “Attention” in “5.9.7. Specifying an execution start time”.
Version 1.43 October 17, 2024
Updated “Note” in “7.1. Overview”.
Updated “Attention” in “7.1. Overview”.
Updated “Attention” in “7.3.5. Power control point”.
Version 1.42 September 4, 2024
Added an explanation of the password when using the Mac keychain to “4.3. Accessing steps to the Fugaku website”.
Version 1.41 July 22, 2024
Updated “5.19.1. job_events”.
Renamed the section from “8. Layered storage and LLIO” to “8. Layered storage”.
Updated “8.1. Overview”.
Renamed the section from “8.2. About Writing Files to FEFS/LLIO” to “8.2. About File Operations to FEFS/LLIO”.
Updated “8.2.2. When multiple processes write to a file”.
Added “8.2.3. Rename File”.
Added “8.2.4. Unlink File”.
Updated “8.3.2. Writing timing to second-layer storage”.
Updated “8.3.4. Asynchronous close / synchronous close”.
Updated “8.6. Important Notices”.
Version 1.40 June 7, 2024
Added a note to “5.10.1. Overview”.
Updated “8.1. Overview”.
Added “8.2. About Writing Files to FEFS/LLIO”.
Updated “8.3. Cache Area of Second-Layer Storage”.
Added “8.3.2. Writing timing to second-layer storage”.
Updated “8.3.4. Asynchronous close / synchronous close”.
Updated “8.4. Node Temporary Area”.
Updated “8.5. Shared temporary area”.
Updated “8.6.2. MPI-IO”.
Version 1.39 May 30, 2024
Updated “6.4.2. Output list by option specification”.
Version 1.38 April 4, 2024
Added “5.19.3. show_evict_node”.
Updated “Attention” in “8.2.1. The Cache Area of Second-Layer Storage size”.
Updated “Attention” in “8.2.3. Asynchronous close / synchronous close”.
Version 1.37 February 2, 2024
Updated the description of the available capacity in “3.4.5.2. Method of using tmpfs area”.
Added description of Maximum amount of job memory in “3.4.12. Estimating the amount of memory available to user programs”.
Version 1.36 January 10, 2024
Reviewed the “4.6.2. Example of command use”.
Rewrited “user_name” to “username” to make the notation consistent.
Rewrited “<username>” to “username” to make the notation consistent.
Rewrited “group_name” to “groupname” to make the notation consistent.
Rewrited “<groupname>” to “groupname” to make the notation consistent.
Version 1.35 December 12, 2023
The name https://www.fugaku.r-ccs.riken.jp/en/ was changed from “the user portal” to “the Fugaku website”.
History retention period changed to 90 days in “5.12.6. Job status display command options”.
Version 1.34 October 6, 2023
Updated the list in “2.2. Manual” due to discrepancies between the listed manuals and the manuals published on the Fugaku website.
Added description of /vol0002 in “3.4.7. File system”.
Updated “Attention” in “3.4.5. Disk”.
Updated “Attention” in “3.4.6. File creation and stripe setting”.
Added “5.9.2.2. Low Priority Jobs”.
Updated the description of “total bytes of packets sent and received” in “5.18. Obtaining TofuD TNR statistics”.
Added “5.19.1. job_events”.
Corrected the notation of the data area path from “/data” to “/vol0n0m/data”.
Corrected the mistakenly written “/vol0m0n/group/data” to “/vol0n0m/data/group”.
Version 1.33 June 19, 2023
Added the description of --gname option in “5.1.4. The command for creating template of job script”.
Added “5.1.5. Job allocation operation”.
Added a note to “5.9.2.1. Running a job with a minimum execution time”.
Added the description of -g option in “5.19. Job list display command affected by the failure”.
Version 1.32 June 2, 2023
Changed the description of “5.9.2. Resource Specification”.
Added “5.9.2.1. Running a job with a minimum execution time”.
Version 1.31 April 25, 2023
Added a reference page for vol0002 to the second-layer storage of “3.4.7. File system”.
Version 1.30 April 4, 2023
Updated an Attention for “7.3.5. Power control point”.
Updated the description of perf in “8.2.5. Option when Job submitting (pjsub –llio)”.
Updated the description of perf in “8.3.3. Job submitting option (pjsub –llio)”.
Updated the description of perf in “8.4.4. Job submitting option (pjsub –llio)”.
Removed restrictions for “–lio perf” on “8.5. Important Notices”.
Version 1.29 March 22, 2023
Updated the puttygen screen examples in “4.4.1.2. Windows (PuTTYgen)”.
Updated the PuTTY screen examples in “4.4.3.2. Login node (PuTTY)”.
Version 1.28 March 15, 2023
Added a reference page for using RSA keys to “4.4.1. Private key/Public key creation”.
Fixed incorrect description ‘Post K’ in ‘9.5.1. Overview’ to ‘Fugaku’.
The description of the sample scripts has been partially changed. There is no problem with the operation even in the conventional description.
Version 1.27 February 8, 2023
Changed the job script description example for “5.8.1.”.
Added Darshan description to “8.6. I/O optimization” and divided it into “8.6. I/O profiling” and “8.7. I/O optimization”.
Version 1.26 January 5, 2023
Changed consolidation script for “8.6.1. Analysis of bottlenecks using LLIO performance information”.
Version 1.25 November 14, 2022
Changed volume number for “3.4.5. Disk”.
Changed volume number for “3.4.7. File System”.
Added new message description to “5.2.1. Submitting a Job”
Updated the description of retention_state in “7.1. Overview”
Updated “8.5. Important Notices”.
Changed volume number for “8.7. Selecting a usage file system (volume)”.
Changed volume number for “8.7.1. Environment Variables and pjsub Options”.
Added environment variable “PJM_LLIO_SHAREDTMP” to “8.7.1. Environment Variables and pjsub Options”.
Version 1.24 October 20, 2022
Updated “6.4. Standard output / Standard error output / Standard input”.
Version 1.23 October 11, 2022
Addd “5.1.4. The command for creating template of job script”.
Updated “5.16.3. PJM 0079 ERROR REASON list”.
Updated an Attention for “8.2.4. Common file distribution function (llio_transfer)”.
Version 1.22 September 20, 2022
Added “5.9.7. Specifying an execution start time”.
Version 1.21 July 28, 2022
The description of the power knob value of the compute node that also serves as IO of “7.1. Overview” has been changed.
Version 1.20 July 11, 2022
Added links to pages related to “5.1.3. Create job script”.
Added examples of messages issued by the pjsub command to “5.2.1 Submitting a job”.
Renamed the section from “5.16.3. GATE CHECK ERROR REASON list” to “5.16.3. PJM 0079 ERROR REASON list”.
Updated “8.5. Important Notices”.
Version 1.19 June 22, 2022
Added flow diagram to “4.1 Overview”.
Version 1.18 June 9, 2022
Indicates that we plan to ban the use of RSA in “4.4.1 Creating a Private/Public Key Pair”.
Version 1.17 May 26, 2022
Updated “3.4.8. Group”.
Version 1.16 May 24, 2022
Added a note about permissions to “4.4 Login”.
Version 1.15 May 16, 2022
Updated “3.3.4. Update process”.
Updated “3.4.4. Resource group use status”.
Updated the explanation of directory name “3.4.5. Disk”.
Updated the display example “3.4.5.1.2. File sharing examples of using ACL”.
Updated the explanation of the second-layer storage “3.4.7. File system”.
Added “3.4.8. Group”.
Updated “3.4.10. Login node”.
Added a note about using Chrome@Mac in “4.3 Accessing steps to the Fugaku website”.
Updated “8.5.1.3. Direct access to second-layer storage”.
Updated “8.7. Selecting a usage file system (volume)”.
Updated “8.7.2. Job submission method”.
Version 1.14 April 3, 2022
Updated “3.4.12. Definition of used computational resource”.
Added “4.4.6. E-mail distribution of Fugaku operation information”.
Updated “5.12.5.3. Performance information output”.
Updated the explanation of “-H” option in “5.12.6. Job status display command options”.
Updated the explanation of “-g” option in “5.19. Job list display command affected by the failure”.
Updated “7.1. Overview”.
Added an attention “8.2. Cache Area of Second-Layer Storage”.
Updated the explanation of “--no-check-directory” option in “8.7.1. Environment Variables and pjsub Options”.
Version 1.13 January 14, 2022
Updated the attention in “3.4.5. Disk”.
Added “Method of specifying the job name” in “5.3.1 .Job submission”.
Version 1.12 December 16, 2021
Updated “3.4.5. Disk”.
Added an attention “8.2.2.2. Stripe setting to second-layer storage”.
Updated “8.5. Important Notices”.
Deleted “8.6.2. Optimizes I/O from multiple processes to the same file”.
Added “8.7. Selecting a usage file system (volume)”.
Version 1.11 December 7, 2021
Updated the “[utility.sh]” example in “5.7.3. Worker process creation request to Agent prosess”.
Updated the “[Master program master_pjaexe.sh]” and “[utility.sh]” example in “5.7.4. Worker process generation by pjaexe command”.
Added an attention of about pjaexe to “5.7.5. Notes on job creation”.
Added a cases where master worker jobs end to “5.7.6.1. Impact to job work”.
Added “5.8.1. Tools to reduce search load on dynamic libraries(sort_libp)”.
Fixed incorrect sample program for “6.10.4. Function use example”.
Added a note about message of “file transfer error information” to “8.2.3. Asynchronous close / synchronous close”.
Added “8.5.1. Notes on High Parallel Jobs (1000 or more parallel)”
Added “8.6. I/O optimization”.
Version 1.10 September 28, 2021
Updated “3.4.7.1. Client cache of the compute node and IO peformance”.
Improved the description of “8.2.4.3. Tool(dir_transfer) to transfer directories using llio_transfer command”.
Updated “8.5. Important Notices”.
Version 1.09.1 September 9, 2021
Fixed incorrect description of retention_state on the IO node in “7. Power control function”.
Version 1.09 September 9, 2021
Added the description of the 2ndfs area to “3.4.5. Disk”
Added the description of the 2ndfs area to “3.4.7. File system”
Added an attention to “5.1.3. Create job script” about characters that can be used in job script filenames.
Updated “5.2.2. Refer to job execution result”
Added description of available characters for job name in “5.9.1. Basic option”
Added “5.20.1. How to use pjrsh command” to “5.20. Note”.
Updated “6.4.1. How to specify standard output / standard error output / standard input”
Modified the description of retention_state in “7. Power control function” to fit the operation.
Updated “8.1.1. IO time reduction and area selection”
Updated Changed from “8.1.2. Reduction of access time to execution modules” to “8.1.2. Simultaneous access to common files from all processes”
Updated “8.2. Cache Area of Second-Layer Storage”
Updated “Attention” of “8.2.1. The Cache Area of Second-Layer Storage size”
Updated “8.2.2.1. Stripe setting to second-layer storage cashe”
Updated command execution example of “8.2.3. Asynchronous close / synchronous close”
Updated “llio_transfer Command usage examples” of “8.2.4. Common file distribution function (llio_transfer)”
Updated “8.2.5. Option when Job submitting (pjsub –llio)”
Updated “8.3. Node Temporary Area”
Updated “8.3.3. Job submitting option (pjsub –llio)”
Updated “8.3.5.2. Unzip of archive files”
Updated “8.4. Shared temporary area”
Updated “8.4.1. Stripe setting for shared temporary area”
Updated “Attention” of “8.4.2. Shared temporary area size”
Updated “8.4.4. Job submitting option (pjsub –llio)”
Updated “8.5. Important Notices”
Version 1.08 July 21, 2021
Added “5.18. Obtaining TofuD TNR statistics”.
Removed “--llio perf” from “llio_transfer Command usage examples” in “8.2.4. Common file distribution function (llio_transfer)”.
Added “8.2.4.2. Tips for common file distribution”.
Added a note about “--llio perf” to “8.5. Important Notices”.
Version 1.07 June 25, 2021
Added a note of mpiexec for “6.2. Execution command format”.
Updated the notes on “8.2.3. Asynchronous close / synchronous close”.
Added link to “8.2.3. Asynchronous close / synchronous close” in the “8.5. Important Notices”.
Version 1.06 June 21, 2021
Added description about failures where resource is refundable in “5.18. Job list display command affected by the failure”.
Added “7.3.6. Note”.
Added “8.2.4.1. Effects of the common file”.
Added “8.5.1. MPI-IO”.
Version 1.05 June 11, 2021
Updated “3.4.12. Definition of used computational resource”.
Added “5.9.6. Environment variables in MPI processes”.
Version 1.04 June 3, 2021
Deleted “3.4.2. Limit value and resource group”.
Added description of node allocation method to “3.4.1. Compute node”.
Updated “3.4.12. Definition of used computational resource”.
Added a note to “5.5.2. Command format” that the recommended wait-time is 60 seconds or longer.
Added “5.7. Master-worker type job”.
Removed resource group description from “5.8.2. Resource specification” and made it a link to another page.
Version 1.03 May 31, 2021
Updated “Attention” of “3.4.6. Disk”.
Updated “3.4.8.1. Client cache of the compute node and IO peformance”.
Fixed the default elapsedtime for interactive job in “5.5.2. Command format” to match current settings.
A fifth point was added to “8.2.3. Asynchronous close / synchronous close”.
Added “8.3.5. Usage example of node temporary area”.
Version 1.02 May 17, 2021
Updated the explanation about elapsed time of job in “3.4.13. Definition of used computational resource”.
Updated the explanation of “-g” option in “5.17. Job list display command affected by the failure”.
Added “8.1.1. IO time reduction and area selection”.
Added “8.1.2. Reduction of access time to execution modules”.
Updated “8.3.2. How to use”.
Version 1.01 April 1, 2021
The host name described in “4.4.3.3. How to directly specify a login node” was changed according to the configuration change of the login node.
Added the explanation according to the operation to “5.8.2. Resource specification”.
Added to “7. Power control function” that the setting value of the power knob (freq) of the IO node (CN/BIO, CN/SIO, CN/GIO) is different from the compute node (CN).
Added to “8.2.4. Common file distribution function (llio_transfer)” that only read-only files can be treated as common files.
Updated of “8.5 Important Notices” to match the current operation.
Version 1.00 March 9, 2021
Updated step 1 of “4.2.3. Installing the certificate to Chrome (Windows)”.
Updated of “4.4.2. Public key registration”.
Changed the host name of login node of “4.4.3. Accessing direction” and “4.4.4. File transfer method”.
Changed name of resource groups that described to example of script in “5. Job execution” .
Updated attention of “8.2.4. Common file distribution function (llio_transfer)”.
Version 0.16 February 22, 2021
The host name described in “4.4.3.3. How to directly specify a login node” was changed according to the configuration change of the login node.
Version 0.15 February 18, 2021
Added “Programming Guide” to “2.2. Manual”.
Fixed “3.2. Use scale/Use environment” to match the current operation.
Fixed “3.4. Resource” to match the current operation.
Added to “5.15.3. GATE CHECK ERROR REASON list” that there is an unused REASON.
Version 0.14 February 1, 2021
Changed the default value of elapse in “5.1.3. Create job script” and added a note about the lower limit value of elapse.
The operation description when –mail-list was not specified in “5.8.1. Basic option” was incorrect, and the incorrect description was deleted.
The output of maximum and minimum values was deleted from “5.11.5.1. Electric power information” due to the operation changes.
Added to “8.5. Important Notices” that the same file can not be used from more than 1,152 nodes.
Version 0.13 January 12, 2021
Added idcheck command to “3.4.11. I/O node”.
Added “5.17. Job list display command affected by the failure”.
Version 0.12 December 15, 2020
Added description about resource default values to “5.1.3. Create job script”.
Deleted the description that resource group specification is required from “3.4.2. Limit value and resource group” and “5.8.2 Resource specification” because resource group specification is optional.
The description of rscunit was deleted from the job script example, because it was no longer necessary to specify the resource unit.
Version 0.11 November 30, 2020
Added the attention that “Setting a resource group name is necessary to submit a job” in “3.4.2. Limit value and resource group” and “5.8.2. Resource specification”.
Added “8. Layered storage and LLIO”.
Version 0.10 November 2, 2020
Added tmpfs area to “3.4.6. Disk”.
Added a note when setting stripes on the compute node in “3.4.7. File creation and stripe setting”.
Added “3.4.8.1. Record Length and Read Performance”.
Deleted an incorrect description that “Make sure to specify over 12 nodes” in “5.4.2. Bulk job script”.
Version 0.9 September 25, 2020
Updated the resource group information of “3.4.2. Limit value and resource group”.
Version 0.8 September 7, 2020
Updated the resource group information of “3.4.2. Limit value and resource group”.
Added the description about deference of the two setfacl command examples described in “3.4.6.2. File sharing examples of using ACL”.
Added “3.4.13. Definition of used computational resource”.
Added the description to delete unnecessary hold jobs by yourself in “5.11.1.4. Job status items”.
Version 0.7 August 6, 2020
Updated resource group information in “3.4.2. Limit value and resource group”
Added a description of the environment variable PLE_MPI_STD_EMPTYFILE and a note if the number of files has increased in “5.2.2. Refer to job execution result”
Added a description to “5.11.1.4. Job status items” that users should delete the error job by themselves
Added description of the environment variable PLE_MPI_STD_EMPTYFILE to “6.4.3. About standard output / standard error output when executing largescale jobs”
Version 0.6 July 22, 2020
Fixed a description error in the pjsub command options described in “5.5.1. Job submission”.
Version 0.5 July 14, 2020
Added description of share directory for data sharing between groups to “3.4.6. Disk”
Added the dercription of file name to “5.2. 2. Refer to job execution result”, because a file for standard output/standard error output is created for each mpiexec command.
Version 0.4 July 1, 2020
Updated resource group information in “3.4.2. Limit value and resource group”
Deleted the note of “3.4.2. Limit value and resource group” because the upper limit of the number of MPI processes has been released.
Added “3.4.11. I/O node”. Introduced a compute node that also serves as an I/O node and described how to identify it based on NODE ID.
Added “3.4.12. Estimating the amount of memory available to user programs”. The quotation formula is described.
Changed the note about browser in “4.3. Accessing steps to the Fugaku website”.
Version 0.3 June 9, 2020
Updated resource group information in “3.4.2. Limit value and resource group”
Corrected the interval for obtaining power information in “5.11.5.1. Electric power information”.
Added an example of outputting by changing the directory for each 1000 ranks in “6.4.3. About standard output / standard error output when executing large-scale jobs”.
Version 0.2 May 15, 2020
Updated resource group information in “3.4.2. Limit value and resource group”
The recommended number of stripes in “3.4.7. File creation and stripe setting” was not appropriate, so removed the discription and will guide it in future.
Added description that there is no logout function in “4.3. Accessing steps to the Fugaku website”.
Added description of the FQDN for each login node in “4.4.3.3. How to directly specify a login node”.
Added “4.4.3.4. Arm login node”.
Added information about the interval for obtaining power information in “5.11.5.1. Electric power information” and “7.3.4. Electric power measurement point”
Added hybrid parallel example in “5.16. Sample Scripts”
Changed the contents in “7.3.3. Use direction Power API from within the program” to the compile method of sample program.