Program

Feb 18, 2019 (Day 1)

9:00 - 9:30
Registration
Opening

9:30 - 9:35
Introduction of Symposium

9:35 - 9:50
Welcome Address
Satoshi Matsuoka (R-CCS)

PDF (click to create a new window)
Message from MEXT (Chair: Yutaka Ishikawa)

9:50 - 10:10
Japan’s Policy in Promotion of High Performance Computing

Suzuka Sakashita (MEXT)
Report from R-CCS (Chair: Makoto Tsubokura)

10:10 - 10:50
K and Post-K

Fumiyoshi Shoji and Yutaka Ishikawa (R-CCS)

Fumiyoshi Shoji's Slides (click to create a new window)

Yutaka Ishikawa's Slides (click to create a new window)
Invited Talk (Chair: Makoto Tsubokura)

10:50 - 11:30
Turbulent Combustion Simulation and In Situ Analytics on Titan and Summit with S3D-Legion

Jacqueline H. Chen (Sandia National Laboratory)

Abstract (click to expand)PDF (click to create a new window)

Turbulent combustion direct numerical simulation (DNS) is a first principles tool for understanding the combustion of complex fuels over a broad range of aerothermochemical conditions representative of practical engines for transportation and power generation. These simulations require the world’s leading supercomputers to numerically resolve the large dynamic range of flow and chemical scales and to perform in situ analytics side-by-side with the DNS. To facilitate performance portability and composable in situ workflows, the DNS code S3D-Legion, a re-implementation of S3D-MPI in a dynamic task-based asynchronous programming model will be described. Weak and strong scaling performance on Summit and Titan at ORNL and PizDaint at ETH will be presented.
11:30 - 11:45
Group Photo
11:45 - 13:00
Lunch
Distinguished Achievements in K and related to Post-K (1) (Chair: Takemasa Miyoshi)

13:00 - 14:00
Beating heart simulation driven by three dimensional molecular dynamics model

Takumi Washio (UT-Heart Inc.), Ryo Kanada (RIKEN), Xiaoke Cui (UT-Heart Inc.), Seiryo Sugiura (UT-Heart Inc.), Yasushi Okuno (Kyoto Univ.) , Toshiaki Hisada (UT-Heart Inc.)

Abstract (click to expand) PDF (click to create a new window)

At the start of K-computer project, we got a chance to use the whole system to compute one and half beats of multiscale heart simulation, in which the three levels (heart-cell-contractile proteins) were modeled and coupled. Though such a large-scale computation was not possible for daily simulations, the developed computational techniques and models, have been applied to the practical clinical applications later on. These experiences reminded us the importance of modeling further details of contractile proteins to compute more realistic movements of muscle and more precise energy consumption. Therefore, in Post-K computer project, we are developing the detailed three dimensional molecular mechanical models and the computational method to couple these models with the continuum mechanics. In this presentation, we will introduce the above mentioned our activities and discuss the parallel computational efficiency in such an application.

Using Artificial Intelligence and Transprecision Computing for Accelerating Finite-Element Urban Earthquake Simulation

Tsuyoshi Ichimura^1,2,3, Kohei Fujita¹, Takuma Yamaguchi¹, Akira Naruse⁴, Jack C. Wells⁵, Thomas C. Schulthess⁶, Tjerk P. Straatsma, Christopher J. Zimmer⁵, Maxime Martinasso⁶, Kengo Nakajima^7,3, Muneo Hori¹, Lalith Maddegedara¹; 1: Earthquake Research Institute & Department of Civil Engineering, The University of Tokyo; 2: AIP, RIKEN; 3: R-CCS, RIKEN; 4: NVIDIA Corporation; 5: Oak Ridge National Laboratory; 6: Swiss National Supercomputing Centre; 7: Information Technology Center, The University of Tokyo

Abstract (click to expand) PDF (click to create a new window)

To address problems that occur due to earthquakes in urban areas, we propose a method that utilizes artificial intelligence (AI) and FP16-FP21-FP32-FP64 transprecision computing to accelerate a nonlinear dynamic low-order unstructured finite-element solver. The AI is used to improve the convergence of iterative solver leading to 5.56-fold reduction in arithmetic count from a standard solver. The developed solver demonstrated high scalability and performance up to 4096 nodes of Summit, 4608 nodes of Piz Daint and 49152 nodes of K computer. The proposed approach utilizing AI and transprecision computing has implications for accelerating other implicit solvers used for earthquake city simulations as well as various fields.

Advancement of meteorological and global environmental predictions utilizing observational “Big Data”

Keiko Takahashi (Japan Agency for Marine-Earth Science and Technology)

Abstract (click to expand)

Accelerating of global warming, it is pointed out that the frequency and severity of extreme weather events are increasing around the world. These include severe heavy rain, the heat wave, tornados and very large typhoons and so on. We are now developing new technology to make accurate predictions of those extreme weather events using ultra-high-resolution simulations and big data obtained from satellite-based observation technologies and ground radars. It enables us to assess the level of various damages caused by the extreme weather events. In addition, the system is being developed to predict the changes of characteristics of typhoons under global warming condition, which means the challenge to predict extreme events in longer range time scale from weeks to decades. Furthermore, a platform technology is being developed to assess and predict the level of particles emitted through human activity—including aerosols, greenhouse gases, and PM 2.5—and assess their effect on the weather and environment. These technologies are expected to contribute to the formulation of policies for protecting the environment, preventing disasters, and maintaining public health.
14:00 - 14:10
Break
Distinguished Achievements in K and related to Post-K (2) (Chair: Takahito Nakajima)

14:10 - 15:10
Accelerated Development of Innovative Clean Energy Systems : Post-K Project Priority Issue 6

Shinobu Yoshimura (Univ. of Tokyo), Tomonori Yamada (Univ. of Tokyo), Naoki Shikazono (Univ. of Tokyo), Akiyoshi Iida (Toyohashi Univ. of Tech.), Yasuhiro Idomura (Japan Atomic Energy Agency)

Abstract (click to expand) PDF (click to create a new window)

We are working on one of the nine priority issues of the FLAGSHIP 2020 project, i.e. Priority Issue 6: Accelerated Development of Innovative Clean Energy Systems. Any source of energy that is superior in all aspects—cost, environmental impact, safety, and use of natural resources—does not exist. Therefore various types of energy systems including clean energy systems have been developed in worldwide. To accelerate the development of such energy systems, computational simulations are strongly expected to play a key role. However, core physics in such systems are very complex and tend to be multi-scale and multi-physics among fluid, thermal, solid, electromagnetics and so on. The Priority Issue 6 targets the following innovative clean energy systems, i.e. carbon-free coal gasification plants (Sub-issue A), low-cost and long-lasting fuel cells (Sub-issue B), large-scale offshore wind farm (Sub-issue C), and nuclear fusion reactors (Sub-issue D). These sub-issues are independent of each other in terms of energy sources, but they have many commonalities in their central physical phenomena (structure, fluid, heat, electromagnetism, material degradation, etc.) and in the simulation problems to be overcome. The coal gasification is one of the key technologies to drastically reduce CO2 emission from coal fired power generation. Coal is crushed into fine particulate matter and then partially burned into gas in a high-pressure and high-temperature environment. We perform a large scale two-way coupled simulation of thermo-combustion-fluid-melting-structure interaction of a full-scale testing reactor. An offshore wind farm for power generation typically consists of tens to hundred of large scale wind turbines. To promote the energy in Japan, we need to take care of heavy weather conditions and narrower offshore sites. To improve the performance of power generation of the whole wind farm and the reliability of individual wind turbine, it is necessary to improve the accuracy of evaluating the degradation of power generation of wind turbine affected by wake interaction, as well as to improve the reliability evaluation of individual wind turbine exposed to the wake. In this presentation, focusing on the coal gasification system and the offshore wind farm, we explain the latest developments and achievements of multiphysics simulations in which a unique parallel coupler, REVOCAP_Coupler, is used to integrate highly parallelized independent solvers such as LES-based flow and combustion solvers, solid and thermal conduction solvers, i.e. FFR-Comb and ADVENTURE_Thermal for coal gasification reactor, and FFB and ADVENTURE_Solid for offshore wind farm.

Massively parallel density matrix renormalization group method algorithm for two-dimensional strongly correlated systems and its applications

Shigetoshi Sota (RIKEN), Takami Tohyama (Tokyo Univ. of Sci.), Seiji Yunoki (RIKEN)

Abstract (click to expand)PDF (click to create a new window)

A strongly correlated quantum system is a class of quantum systems where many-body interactions play an essential role and cannot be treated by first-principles calculations based on the density functional theory. The density matrix renormalization group (DMRG) procedure is known as one of the most powerful and accurate numerical methods for one-dimensional strongly correlated quantum systems. To the contrary, in the two- or higher-dimensional systems, the DMRG method has been less accurate because to obtain the accurate physical quantities in higher-dimensions the DMRG method requires an exponentially large DMRG truncation number m, which makes the practical calculations impossible by using the standard computer facilities. This difficulty is mentioned by the area-row of the entanglement entropy of a pure state such as the ground state. However, the recent development of supercomputing systems enables us to perform the DMRG calculations even for two-dimensional strongly correlated systems. In the present study, we have developed a massively parallel two-dimensional DMRG algorithm. Using the K computer, we can perform the DMRG calculation by taking a huge m [1-6]. Combing with the kernel polynomial expansion method, we are able to simulate quantum dynamics for two-dimensional strongly correlated quantum systems [7-9]. In this presentation, we show our developed massively parallel DMRG algorithm and its applications.

Validation of alternative technology by direct turbulence simulation for towing tank experiment

Tatsuo Nishikawa (Shipbuilding Research Centre of Japan)

Abstract (click to expand)PDF (click to create a new window)

In the 1990s, many researchers of Computational Fluid Dynamics in the marine engineering field actively worked on the development of a new turbulence model in the framework of Reynolds-averaged Navier Stokes (RANS) Equation method. They mainly tuned to match the hull resistance and the flow pattern in the propeller section with the towing tank experiment. In the 2000s, some specialized tools were developed to enable ship designers to use easily in practice and began to be utilized widely in the early stages of hull designing. In the 2010s, estimation of the interference effect between hull and the propeller by self-propelled simulation became feasible as well as the resistance increase in wave and also cavitation characteristics. On the other hand, tank test has not been completely replaced by the simulation due to the accuracy and reliability problem of RANS. The most critical problem of RANS is that the turbulence model has to be tuned each time a new type of hull-form comes out by using towing tank test. And this tuning requires enormous effort and moreover, it sometimes occurs puzzling situation such as hull resistance matches, but the wake-flow distribution does not match, or vice versa. The purpose of this research is to develop an estimation technology of ship-propulsion performance with accuracy and reliability equivalent to towing tank test using Fully-resolved Large Eddy Simulation. This is a shift from the turbulence model dependent type to high-fidelity first principle approach. We have been conducting validation and verifications using results of towing tank test for several kinds of ship-types in collaboration with eight domestic major shipyards. This project started in 2011 and scheduled to complete in 2018. The cost of computational resource for a single vessel including both resistance and self-propulsion case estimated in the year 2012 was about 190 million yen, but estimation in 2018 is 3.7 million yen. Compared to towing tank tests costing from 5 million yen to 10 million yen per one vessel, now the price has been sufficient to satisfy the investment advantage of shipyards. For practical use in the real sense, we need more time for transition period to shift from tank test to computational simulation and activities related to this will also be introduced in the presentation.
15:10 - 15:20
Break
Panel Discussion

15:20 - 16:20
Panel Discussion: From K to Post-K

Moderator: Chisachi Kato (Univ. of Tokyo)
Panelist: Takumi Washio (UT-Heart), Kohei Fujita (University of Tokyo), Keiko Takahashi (JAMSTEC), Shinobu Yoshimura (University of Tokyo), Synge Todo (University of Tokyo), Tatsuo Nishikawa (Shipbuilding Research Centre of Japan)
Poster

16:20 - 18:00
See accepted posters below.
18:00 - 19:30
Reception

Feb 19, 2019 (Day 2)

9:00 - 9:30
Registration
Invited Talk (Chair: Florence Tama)

9:30 - 10:50
HPC for biomaterials: why playing soccer hurts

Frauke Gräter (HITS)

Abstract (click to expand)PDF (click to create a new window)

Materials - be it a shoe sole or an achilles tendon - respond to mechanical stress involving length scales all the way down to atoms and electrons, rendering computational materials science a prime application area of HPC. I will show how large-scale simulations of collagen have fundamentally changed our understanding how tension in this biomaterial, in our achilles tendon or muscle, can increase pain or muscle sour. I will give an outlook on how exascale can go beyond these achievements. The vision is to watch microcracking in materials all the way to failure, at relevant dimensions and yet at atomistic detail.

Exascale Computing, Artificial Intelligence and Cancer

Rick L. Stevens (Argonne National Laboratory/The University of Chicago)

Abstract (click to expand)PDF (click to create a new window)

In this talk I'll give an update on the US Exascale Computing Initiative (ECI). The ECI is a national plan to develop and field multiple Exascale computing systems starting in 2021. In addition to developing systems the ECI is supporting the development of a broad collection of Exascale applications and software environments to support the mission of the Department of Energy (DOE) and the broad needs of the US research community. Artificial Intelligence (AI) based applications are emerging as important new drivers of advanced computing capabilities across many scientific disciplines. I'll discuss key "AI in Science" opportunities that will be enabled by Exascale systems and walk through some of the architectural, software and methods challenges. Finally, I will present an overview of the joint program of the DOE and the National Cancer Institute to apply Exascale computing to three Cancer research challenge problems (i.e. drug response prediction, RAS biology and understanding patient trajectories).
10:50 - 11:00
BREAK
Distinguished Achievements in K and related to Post-K (3) (Chair: Yasumichi Aoki)

11:00 - 12:00
Revealing Drug-Target Binding Pathway using Two-dimensional Replica-Exchange Molecular Dynamics Method

Suyong Re (RIKEN), Hiraku Oshima (RIKEN), Kento Kasahara (RIKEN), Motoshi Kamiya (National Institutes of Natural Sciences), Yuji Sugita (RIKEN)

Abstract (click to expand) PDF (click to create a new window)

Traditionally, drug compounds are designed to exhibit an optimal thermodynamic property, namely binding affinity to a target protein. However, many compounds still lack in vivo efficacy. In the past decade, the residence time concept has received increasing attention, where the drugs are designed to stay around the target protein much longer than the others. This kinetic-based approach has received increasing attention. The rational control of kinetics requires the knowledge of underlying free-energy landscapes, but their characterization is fundamentally difficult. Here, we apply the Replica-exchange molecular dynamics (REMD) method to study drug-target binding pathway. This method, by exchanging parameters, prevents computation from being trapped by local minima and enables the rugged free energy landscape to be elucidated. REMD has been widely used for the studies of protein folding and conformational change of proteins but much less for protein-ligand binding. We propose a combined replica-exchange umbrella sampling (REUS) and generalized replica-exchange with solute tempering (gREST) scheme for drug-target binding simulation. In brief, our scheme, referred to as gREST/REUS, conducts replica exchanges in two dimensions: the exchange along drug-target distance to accelerate the (un)binding occurrence and temperature exchange to further enhance the sampling. The temperature exchange is performed in a subsystem, solute molecule, for efficiency. We define the solute as dihedral-angle and nonbonded energy terms of a ligand and binding site residues. With the aid of enhanced flexibility of drug and protein binding site, we realize multiple binding and unbinding events in the simulation and predict in unprecedented accuracy not only the X-ray binding structure but also multiple transient intermediates along the binding pathway. The gREST/REUS is available in GENESIS program package and generally applicable to study drug-target bindings. The details of results on an inhibitor binding to a protein kinase will be discussed in this presentation.

First-principles sampling simulation approaches to battery science and technology

Yoshitaka Tateyama (National Institute for Materials Science(NIMS))

Abstract (click to expand)

Development of next-generation battery is a most important issue in modern society. For emerging technologies such as electric vehicle and smart grid, batteries with higher energy and power densities, higher safety and longer durability are needed. From the viewpoint of materials science, elucidation of all electronics and atomic processes in the battery is essential. However, the in-situ experimental observations are still difficult and many problems remain unsolved. Then, we have addressed understanding the battery microscopic mechanisms by means of first-principles sampling simulation approaches based on density functional theory (DFT) calculations of the electronic states with reasonable accuracy. By adding the multiple parallelisation functions to a DFT molecular dynamics (MD) open source “CPMD” [1], we constructed “stat-CPMD extension” for efficient sampling and free energy profile calculation of chemical and redox reactions on the K computer. The stat-CPMD code showed FLOPS efficiency over 30% and SIMD ratio over 70% in the best cases. We then addressed long-standing issues of the SEI (solid electrolyte interphase) formation and properties at anode – organic electrolyte interfaces, which really determines the performance and the reliability of the battery. Besides, we have investigated the electrochemical and transport properties of a new class of electrolyte, “highly-concentrated electrolyte”, emerging recently. Regarding the SEI issue, we found a novel mechanism of the VC additive effect on the initial reductive decomposition of EC solvent [2], and then proposed a probable subsequent mechanism with the decomposed products toward the SEI formation at the electrode – electrolyte interface [3,4], which overturns the conventional understandings. For the latter, we elucidated origins of the unusual electrochemical stability and excellent ion transport in superconcentrated electrolytes [5-9]. These findings via DFT sampling simulations provide new perspectives for the battery science and technology. These works were done in collaboration with Dr. Keitaro Sodeyama (NIMS), Dr. Yukihiro Okuno and Keisuke Ushirogata (FUJIFILM corporation), Prof. Yuki Yamada and Prof. Atsuo Yamada (The University of Tokyo), and partly supported by MEXT as “Priority Issue (No. 5) on Post K computer” (Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use). [1] CPMD (IBM Corp/Max-Planck Institut für Feskörperforschung Stuttgart, 1990–2015); http://www.cpmd.org [2] K. Ushirogata, Y. Tateyama et al., J. Am. Chem. Soc. 135, 11967-11974 (2013). [3] K. Ushirogata, Y. Tateyama et al., J. Electrochem. Soc. 162, A2670-2678 (2015). [4] Y. Okuno, Y. Tateyama et al., Phys. Chem. Chem. Phys. 18, 8643-8653 (2016). [5] Y. Yamada, Y. Tateyama, A. Yamada, et al., J. Am Chem. Soc. 136, 5039-5046 (2014). [6] K. Sodeyama, Y. Tateyama et al., J. Phys. Chem. C 118, 14091-14097 (2014). [7] J. Wang, Y. Yamada, Y. Tateyama, A. Yamada et al., Nat. Commun. 7, 12032 (2016). [8] Y. Yamada, Y. Tateyama, A. Yamada et al., Nat. Energy 1, 16129 (2016). [9] J. Wang, Y. Yamada, Y. Tateyama, A. Yamada et al., Nat. Energy 3, 22-29 (2018). --> Development of next-generation battery is a most important issue in modern society. For emerging technologies such as electric vehicle and smart grid, batteries with higher energy and power densities, higher safety and longer durability are needed. From the viewpoint of materials science, elucidation of all electronics and atomic processes in the battery is essential. However, the in-situ experimental observations are still difficult and many problems remain unsolved. Then, we have addressed understanding the battery microscopic mechanisms by means of first-principles sampling simulation approaches based on density functional theory (DFT) calculations of the electronic states with reasonable accuracy. By adding the multiple parallelisation functions to a DFT molecular dynamics (MD) open source “CPMD” [1], we constructed “stat-CPMD extension” for efficient sampling and free energy profile calculation of chemical and redox reactions on the K computer. The stat-CPMD code showed FLOPS efficiency over 30% and SIMD ratio over 70% in the best cases. We then addressed long-standing issues of the SEI (solid electrolyte interphase) formation and properties at anode – organic electrolyte interfaces, which really determines the performance and the reliability of the battery. Besides, we have investigated the electrochemical and transport properties of a new class of electrolyte, “highly-concentrated electrolyte”, emerging recently. Regarding the SEI issue, we found a novel mechanism of the VC additive effect on the initial reductive decomposition of EC solvent [2], and then proposed a probable subsequent mechanism with the decomposed products toward the SEI formation at the electrode – electrolyte interface [3,4], which overturns the conventional understandings. For the latter, we elucidated origins of the unusual electrochemical stability and excellent ion transport in superconcentrated electrolytes [5-9]. These findings via DFT sampling simulations provide new perspectives for the battery science and technology. These works were done in collaboration with Dr. Keitaro Sodeyama (NIMS), Dr. Yukihiro Okuno and Keisuke Ushirogata (FUJIFILM corporation), Prof. Yuki Yamada and Prof. Atsuo Yamada (The University of Tokyo), and partly supported by MEXT as “Priority Issue (No. 5) on Post K computer” (Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use). [1] CPMD (IBM Corp/Max-Planck Institut für Feskörperforschung Stuttgart, 1990–2015); http://www.cpmd.org [2] K. Ushirogata, Y. Tateyama et al., J. Am. Chem. Soc. 135, 11967-11974 (2013). [3] K. Ushirogata, Y. Tateyama et al., J. Electrochem. Soc. 162, A2670-2678 (2015). [4] Y. Okuno, Y. Tateyama et al., Phys. Chem. Chem. Phys. 18, 8643-8653 (2016). [5] Y. Yamada, Y. Tateyama, A. Yamada, et al., J. Am Chem. Soc. 136, 5039-5046 (2014). [6] K. Sodeyama, Y. Tateyama et al., J. Phys. Chem. C 118, 14091-14097 (2014). [7] J. Wang, Y. Yamada, Y. Tateyama, A. Yamada et al., Nat. Commun. 7, 12032 (2016). [8] Y. Yamada, Y. Tateyama, A. Yamada et al., Nat. Energy 1, 16129 (2016). [9] J. Wang, Y. Yamada, Y. Tateyama, A. Yamada et al., Nat. Energy 3, 22-29 (2018).

Statistical Computational Cosmology with Big Astronomical Data

Naoki Yoshida (Univ. of Tokyo)

Abstract (click to expand)PDF (click to create a new window)

I introduce the recent development of computational cosmology and the future prospect in its data-science aspect. I present the results from our ongoing project that utilizes Japan's Subaru telescope to detect distance supernovae and galaxies, and to probe the distribution of matter in the Universe. High performance computing is playing a vital role in the project. For instance, a large number of observational mock catalogues are generated from supercomputer simulations of cosmic structure formation. Such simulations also enable us to develop an efficient emulator that calculates a variety of statisitics of galaxy/matter distribution in the universe. Ongoing and future sky surveys will deliver data of exabyte volume. A concerted use of physics, statistics, computer science, simulations, powerful computers, etc etc are needed in the coming decade.
12:00 - 13:30
Lunch
Invited Talk (Chair: Mitsuhisa Sato)

13:30 - 15:30
Tianhe-3 and the Exascale Road in China

Ruibo Wang (National University of Defense Technology)

Abstract (click to expand)

After a brief introduction to the Tianhe-1 and Tianhe-2 supercomputers, this talk focuses on the progress of Tianhe-3 exascale system, including the prototype system development, the new chips, and the new interconnection network. Major issues on exascale system development and the Tianhe application ecosystem will be discussed. The perspective of Exascale systems development in China, especially the projects under the national key R&D program are discussed toward the end of the talk. --> After a brief introduction to the Tianhe-1 and Tianhe-2 supercomputers, this talk focuses on the progress of Tianhe-3 exascale system, including the prototype system development, the new chips, and the new interconnection network. Major issues on exascale system development and the Tianhe application ecosystem will be discussed. The perspective of Exascale systems development in China, especially the projects under the national key R&D program are discussed toward the end of the talk.

EPI, Europe strikes back on HPC

Yingchih Yang, European Processor Initiative

Abstract (click to expand)PDF (click to create a new window)

European Processor Initiative (EPI) represents the ambition of Europe in HPC field. This joint project backed by EU H2020 framework contains 23 project members from industrial and scientific segments. It aims to develop critical technologies and generate key components for next HPC systems: a high performance -low energy consumption processor. In this talk we will disclose the project outlook and the vision of the team.

Arm in HPC

Brent Gorda (Arm)

Abstract (click to expand)PDF (click to create a new window)

Arm-based hardware and software products are recent entrants into HPC. The goal is to bring architectural choice back to HPC with Arm-based servers. Already these server-class designs are starting to appear and HPC performance is available. On the software side, Arm is working closely with the growing ecosystem to ensure a mature and robust stack. In this talk, we will discuss the progression of Arm in HPC and macro-trends motivating a shift in architecture for Cloud and Data Center.
15:30 - 15:45
BREAK
Panel Discussion

15:45 - 17:30
Towards Arm eco-system

Moderator: Satoshi Matsuoka (R-CCS)
Panelist: Ruibo Wang (National University of Defense Technology), Yingchih Yang (European Processor Initiative), Brent Gorda (Arm), Jacqueline H. Chen (Sandia National Laboratory) Rick L. Stevens (Argonne National Laboratory/The University of Chicago)
17:30 - 17:40
Closing

List of Accepted Posters

Society with health and longevity

1 Cloud computing and containerization in Big Data analytics with application in life science

Marco Capuccini(Uppsala University)

Abstract(Click to expand)

Modern science is increasingly driven by data-intensive processing. Processing large-scale datasets requires a large amount of hardware resources, making infrastructure procurement challenging. To this extent cloud computing represents an interesting opportunity. In fact, cloud resources come as virtualized infrastructure that can be allocated on demand. In hybrid settings, cloud enables to take advantage of in-house resources as well as leveraging on externals providers when needed. In state-of-the-art setups, application containers are leveraged to package and deploy software components over a geographically disperse federation of cloud-based and bare-metal data centers. Like virtual machines, application containers isolate and encapsulate software and dependencies, but drastically reduce overhead by not requiring virtualization.This presentation reports success stories in leveraging cloud infrastructure and application containers for data-intensive processing in three European initiatives, in which we were actively involved: (i) PhenoMeNal project (http://phenomenal-h2020.eu) in the field of large-scale medical metabolomics, (ii) the HASTE project (http://haste.research.it.uu.se) in the field of deep learning-aided microscopy imaging and (iii) the OpenRiskNet project (https://openrisknet.org) in the field of risk assessment of chemicals. Basing on our experience, we will discuss lesson learned, best practices and pitfalls of the technology, and integrations with the HPC ecosystem.
2 Development of analyzing method for reaction pathway using molecular dynamics

Chigusa Kobayashi(Riken center for computational science), Yasuhiro Matsunaga(Riken center for computational science), Jaewoon Jung(Riken center for computational science), Yuji Sugita(Riken center for computational science)

Abstract(Click to expand)

Molecular dynamics (MD) is a method of calculating motion of particles by numerically solving the equation of motion using interaction forces between atoms. In biophysics and biochemistry, it is widely used to investigate relationship between biomolecular structure and function as well as those between the conformational dynamics and function. MD can reveal atomically detailed description of biomolecular structures on reaction pathway. In many cases, however, the simulation time is shorter than time scale of the reaction. We have developed a high-performance MD simulation package, GENESIS, to perform MD simulations of biomolecules efficiently on K computer. In addition, we have recently introduced a rare event sampling method, string method, and introduced it into GENESIS. By using the method, we focus on a calcium pump, Sarco(endo)plasmic reticulum Ca²⁺-ATPase (SERCA). SERCA is a representative membrane transport protein. The protein moves Ca²⁺ ions by utilizing ATP-hydrolysis against its large concentration gradient. In this study, we performed the string method to investigate a reaction pathway on dissociation of nucleotide. We also calculated molecular dynamics simulations for two end states of the reaction. From these simulations, we discuss conformational fluctuations and changes of SERCA along the reaction pathway.
3 New free-energy calculation method using replica-exchange umbrella sampling combined with Gaussian accelerated molecular dynamics

Hiraku Oshima(RIKEN), Suyong Re(RIKEN), Yuji Sugita(RIKEN)

Abstract(Click to expand)

Enhanced sampling methods have been widely used for molecular dynamics simulations of protein folding and conformational change. Recent studies have shown that a combination of two enhanced sampling method can further increase the sampling efficiency. For example, combining replica-exchange umbrella sampling (REUS) with generalized replica-exchange with solute tempering (gREST), we realized multiple binding poses of a ligand which are not observed experimentally. However, the combination requires extremely large computational burden because a large number of copies of the system (replicas) must be simulated parallel. In order to save computational resources, we here propose the combination of REUS with Gaussian accelerated molecular dynamics (GaMD), which is referred to as GaMD/REUS. GaMD accelerates the conformational sampling of biomolecules by adding a non-negative boost potential to the system potential energy. GaMD does not require predefined reaction coordinates, which is a significant advantage because expert knowledge of the biomolecular systems of interest is essential for the definition of reaction coordinates. We implemented GaMD/REUS into GENESIS program package and applied it to several biomolecules including peptide folding. Although GaMD/REUS requires only the same amount of computational resources as REUS does, it can largely increase the convergences of free-energy calculations. We expect GaMD/REUS to be useful for ligand binding simulation and drug discovery, which is one of the priority issues of the post-K computer.
4 Crowder effects on a protein-ligand binding process

Kento Kasahara(RIKEN), Hiraku Oshima(RIKEN), Grzegorz Nawrocki(Michigan State University), Isseki Yu(Maebashi Institute of Technology), Suyong Re(RIKEN), Michael Feig(Michigan State University), Yuji Sugita(RIKEN)

Abstract(Click to expand)

Protein-Ligand Binding process is ubiquitous in biological systems. The reaction rate is mainly determined from the approaching process of ligand to receptor protein (diffusion) and insertion process. Recently, ligand binding process has been extensively investigated by means of microsecond-scale molecular dynamics (MD) simulation. These previous works focus on the reaction in diluted systems. However, in biological systems, both protein and ligand molecules are surrounded by other proteins and metabolites referred to as crowded environments, and hence the reaction is strongly affected by them. In the present study, we elucidate the crowder effect on ligand binding process based on MD simulation. The target receptor protein and ligand molecule are Src kinase and PP1 inhibitor, respectively. The kinase has many binding sites, in addition to the canonical binding site. Eight Bovine serum albumin (BSA) protein molecules are placed in the system as crowder agents. We found that the large portion of the ligand molecules are trapped by the crowder protein surfaces, corresponding to the decreasing of the ligand concentration around the kinase. At the same time, the spatial distribution function (SDF) analysis for the ligands indicates that the many binding sites of the kinase is sterically blocked by the crowder. We also found that the approaching of the ligands to the kinase via the surface of the crowder molecules often occurs, meaning that the approaching pathway is significantly changed from that in dilute system.
5 Building a human-scale cerebellar network model on K computer using MONET simulator

Hiroshi Yamaura(The University of Electro-Communications), Jun Igarashi(Head Office for Information Systems and Cybersecurity, RIKEN), Tadashi Yamazaki(The University of Electro-Communications)

Abstract(Click to expand)

Human-scale whole-brain network simulation is an ambitious challenge for computational neuroscience and high-performance computing. K computer has been used for this purpose, and its successor Post-K is expected to achieve the goal. The human cerebellum holds 80% of all neurons in the human brain. The human cerebellum contains about 69 billion neurons. Although supercomputers are substantially much more powerful than standard workstations for large-scale neural network simulations, it is difficult to pull out the substantial performance, because users need to rewrite source codes to optimize to a target supercomputer. We developed a neural network simulator called MONET (Mille-feuille like Organization NEural neTwork). The MONET simulator can calculate layered sheet types of neural networks with parallelization by tile partitioning. Since the cerebellum has a very regular structure, the MONET simulator is suitable for the simulation of the cerebellar neural network model. We built a spiking neural network model of the cerebellum based on electrophysiological and anatomical data on K computer using the MONET simulator. We investigated the network dynamics of the cerebellar network model. The cerebellar network model exhibited a reservoir-like activity pattern of granule cells in response to constant input signals, which would be useful for cerebellar reservoir computing. We also performed a simulation of optokinetic response that is one of the simplest forms of cerebellum-dependent task. The simulated network dynamics was qualitatively similar to that observed experimentally in animals. Next, we analyzed weak scaling property of the cerebellar network model. In the weak scaling, network size per compute node is fixed while the number of compute nodes are varied. We varied the number of compute nodes at from 1024 to 82944 while increasing the network size of the cerebellar model, and measured the computational time. The good scaling property was obtained in the cerebellar network model. Using full nodes (82944 nodes) of K computer, we succeeded to build a cerebellar model composed of almost the same number of neurons, 68 billion, in the human brain. The MONET simulator can connect the cerebellar network model and network models of other brain regions (cerebral cortex, thalamus, and basal ganglia). In conclusion, the MONET simulator is useful for human-scale whole-brain network simulation on Post-K computer.
6 Simulating large-amplitude transitions in proteins with a coarse-grained model

Ai Shinobu(RIKEN Center for Computational Science), Yasuhiro Matsunaga(RIKEN Center for Computational Science), Chigusa Kobayashi(RIKEN Center for Computational Science), Yuji Sugita(RIKEN Center for Computational Science)

Abstract(Click to expand)

Molecular dynamics (MD) simulations of biomolecules are widely used to investigate conformational dynamics and structure-function relationship. All-atom (AA) models provide the most accurate description of the underlying dynamics. However at present, even with the fastest computers the time-scale attainable with an atomistic simulation is up to a millisecond, whereas biologically relevant motions occur at the time scale of milliseconds to seconds. To overcome this limitation, Coarse-grained (CG) modeling can be utilized. The use of CG models reduces the computational time by several orders of magnitude, allowing access to time-scales unreachable by conventional AA MD simulations. In our group we develop GENESIS, a high-performance MD program for simulating proteins in solution and membrane environments. An efficient parallelization scheme makes it highly scalable, with the ability to simulate huge systems on the atomistic level using massively parallel supercomputers. Implementation of an efficient simulator for CG-modeled systems in the GENESIS platform will make it feasible to attain time- and length-scales which were inaccessible thus far, getting us a step closer to the ultimate goal of understanding biological functions of macromolecules in realistic cellular environment. We focus our efforts on implementing applications such as Brownian dynamics for crowded environment, a CG model for targeting inter-domain motions, as well as a multi-basin CG model for simulating structural transitions, the latter is the subject of the work presented herein. Biochemical reactions are often coupled with large-amplitude structural transitions, a very common case is a protein transitioning from an open to closed state upon substrate binding. In this work, we designed a scheme to effectively simulate such transitions and implemented it in GENESIS as part of the CGMD platform. We discuss the results obtained by applying the scheme to several well-known systems.
7 Integrative modeling of protein dynamics from time-series data of single-molecule experiments and molecular dynamics simulations

Yasuhiro Matsunaga(RIKEN), Yuji Sugita(RIKEN)

Abstract(Click to expand)

Single-molecule experiment and molecular dynamics (MD) simulation are indispensable tools for investigating protein conformational dynamics. The former provides time-series data, such as donor-acceptor distances, while the latter gives atomistic information, though often biased by force field parameters. Previously, we developed a method to combine the complementary information from the two approaches and construct a consistent model of conformational dynamics. Here, we apply the method to the folding dynamics of protein G. First, MD simulations led to an initial Markov state model (MSM), which was then “corrected” using single-molecule Förster resonance energy transfer (FRET) data through hidden Markov modeling. We investigate the folding pathways in the original and corrected MSMs and discuss their folding mechanisms.
8 Spatial information processing in a spiking neural network model for the rodent primary somatosensory cortex

Zhe Sun(RIKEN), Jun Igarashi(RIKEN)

Abstract(Click to expand)

Primary somatosensory cortex (S1) locates in the parietal lobe and it processes the sensory information from whole body. Understanding the structure and function of S1 is critical for figuring out the information process mechanism in the sensory nervous system.Spatial organization of connections in the sensory cortex is considered to work as an information processing device. However, it remains unknown how different types of connections with different spatial extent work for sensory processing in the primary somatosensory cortex (S1). Here based on anatomical and electrophysiological data of rodents, we developed a spatial spiking neural network model for the S1.The S1 model comprised 7 layers (L1: 2 inhibitory neuron types; L2 and L3: 3 inhibitory and 1 excitatory neuron types; L4, L5A, L5B & L6: 2 inhibitory and 1 excitatory neuron types). We used the layer thicknesses and the cell densities of the mouse’s S1 data. Leaky integrate-and-fire neuron model was used for all neuron types. We used the information of spatial extents, probabilities, and connectivity from the reports of laser-scanning photostimulation (LSPS) experiments and patch clamp recordings. We used Gaussian functions for the connection probability function of a horizontal distance between two neurons. We performed S1 simulation on the K computer with NEST version 2.16. We used a time step of 0.1 ms. When we performed simulation of 1mm2 of S1 on 50 compute nodes using hybrid parallelization with 8 OpenMP threads per one MPI process, the simulator took about 1045 seconds for the creation of the network and about 342 seconds per 1s biological time.We made a virtual slice of the S1 whose shape was a cube of 1400 x 400 x 1400 micron. We first performed virtual LSPS experiments for excitatory and inhibitory connections to all neuron types. The responses of neurons to LSPS were qualitatively similar to those in the LSPS experiments. Most importantly, to investigate the relation between excitatory and inhibitory signals, we compared the excitatory and inhibitory conductance with changing distances between neurons with external stimulation and recorded neurons. The excitatory and inhibitory synaptic conductance of L2/3 and L5 excitatory neurons similarly decayed with increasing in the horizontal distance between stimulation sites and positions of recorded neurons, which is similar to real experimental results. These results suggest that spatial extents of different connections may cause spatially coupled excitation and inhibition in L2/3 and L5A, which may lead to cooperative information processing by excitation and inhibition.
9 GENESIS developments for high performance computation

Jaewoon Jung(RIKEN), Chigusa Kobayashi(RIKEN), Takaharu Mori(RIKEN), Yuji Sugita(RIKEN)

Abstract(Click to expand)

GENESIS (GENeralized Ensemble Simulation System) is a software package for molecular dynamics simulations of biological systems. It is designed to extend the spatiotemporal scale by efficient parallelization scheme and enhanced sampling algorithm. To maximize the parallel efficiency of GENESIS on K, we developed domain decomposition scheme named midpoint cell method, volumetric decomposition of fast Fourier transform (FFT). Now, GENESIS is scalable up to 262,000 CPU cores on K computer for the all-atom system consisting of about 100 million atoms. GENESIS shows good parallel efficiency not only on K but also on PC clusters with various architecture including GPU and Xeon Phi. Recently, we’re doing further developments on the next generation supercomputers including post K.
10 Cocktail-Party Single-Speaker Separation using a Convolutional Neural Network based on a Denoising Autoencoder

Kundan Kumar(Indian Institute of Science), Saurabh Kumar Gupta(Indian Institute of Science), Chetan Singh Thakur(Indian Institute of Science)

Abstract(Click to expand)

The human auditory cortex is able to recognize and focus on selective voices in a noisy and multi-speaker environment. Speech separation is an important research problem in the fieldsof neuroscience, computer science, health-care application and auditory modelling. Traditionally, audio separation is modelled as information processing, which includes the design of filters, hand-selected features, and computational modelling of the human auditory cortex. Recently, deep-learning-based speech separation models have attracted significant attention, and deep neural network frameworks have been shown to be effective in learning the useful representations of a target speaker from a mixture of speakers. In this paper, we introduce a novel approach for the segregation of monaural sound mixtures based on Denoising Autoencoders (DAEs) using a convolution neural network (CNN) along with a spectral mask framework. Specially, we explore a training scheme for an encoder-decoder network based on a Kullback–Leibler (KL) divergence cost function proposed by Lee et al [1]. We evaluate our model using a mixture of speakers created from the LibriSpeech ASR dataset. The performance of the reconstructed audio is evaluated by calculating the signal-to-noise ratio (SNR) with respect to clean audio of the target speaker. We compared SNR calculated on samples from test data for KL divergence loss with L2 loss. The KL divergence loss function offers a better reconstruction of the audio source from its mixture as compared to mostly used L2 losses. In future, we will train DAE with more biologically plausible features of the speaker from the CAR-FAC Model of the Cochlea proposed in [2]. We will also expand this work to implement neuromorphic hardware accelerator for intelligent auditory attention speaker segregation.
11 Deep Generative learning based computationally efficient ultrasound imaging system for health care application

Saurabh Kumar Gupta(Indian Institute of Science, Bangalore), Kundan Kumar(Indian Institute of Science`), Chetan Singh Thakur(Indian Institute of Science)

Abstract(Click to expand)

High-resolution ultrasound imaging systems generate a huge amount of data with a high frame rate to provide high-resolution imaging for biomedical applications. It requires computational intensive resources and high-speed data links, further making the system bulky and power consuming. This results in a lack of portability and deployment on the remote-location where power budget is limited, thus restricting the healthcare facility accessibility. There have been always a tradeoff between the quality of image and cost of equipment, which limits the use of ultrasound imaging at an affordable cost. In recent year, the problem of recovering undersampled measurements has shown a growing interest along with the emergence of compressed sensing (CS) framework. In ultrasound imaging, compressive sensing framework has been used for compressed data acquisition and beamforming, opening a path to the reconstruction of high-resolution images with undersampled data. However, the CS reconstruction involves the use of convex optimization algorithms that require hundreds of iterations to converge, which limits its use in real-time implementations in ultrasound imaging systems at the health-care node. The goal of our work is to present a non-iterative algorithm for recovering undersampled ultrasound images using Generative models (Variational Auto-encoders). Variational Auto-encoders [1] provide an efficient, method to recover a latent representation z ("encoding") of our data-points x (the "decoded" observations). By training VAEs on a large dataset, we develop an effective encoding mechanism for our observations, which can be used to either generate realistic new data or reconstruct a data item from a compressed measurement. The proposed framework can also be used for other medical imaging areas like computed tomography, rapid MRI and neuronal spike train recovery. We will further explore the system level implementation of hardware with less computational complexity and low power consumption.
12 constructing bottom up simulation of insect brain for understanding elementary intelligence on the massively parallel environment

Tomoki Kazawa(The University of Tokyo), Daisuke Miyamoto(The University of Tokyo), Heewon Park(The Graduate School of Information Science and Technology, The University of Tokyo), Buntaro Sakai(The Graduate School of Information Science and Technology, The University of Tokyo), Hayato Tsunoda(The Graduate School of Information Science and Technology, The University of Tokyo), Tetsuya Fukuda(The University of Tokyo), Ryohei Kanzaki(The University of Tokyo)

Abstract(Click to expand)

We chose the road via a reproduction of a small insect brain by a detailed simulation on the big computational resource to understand the intelligences originate from the neural circuit. The real-time simulation of a whole insect brain scale (one million neurons) on post K computer is our big milestone at 2020.An antennal lobe, the first olfactory center in the insect brain, is a good model system for building simulation model due to the abundant data and easiness of physiological experiment. We built the simulation model of silkmoth antennal lobe based on the pheromonal receptor activities, dose response characteristics of projection neuron, pharmacological experiments to GABA receptor, and so on. Normalized olfactory response of antennal lobe projection neuron to tetanus pheromonal stimuli observed in our experiments(Fujiwara 2014) were reproduced in our simulation model that use GABA_B receptor as the mutual inhibition mechanism among local inter neuron on the point neuron H-H model.For the more detailed simulation, we introduced multi-compartment models based on the morphology of silkmoth antennal lobe neurons stored in the our Bombyx Neuron Database(BoND). The pheromonal stimuli to MGC in the dorsal antennal lobe and mechano-sensory input in the ventral part of antennal lobe from AMMC and a part of general odor identification mechanism were implemented in the simulation for the multisensory integration in the natural environment.The simulation of optic lobe, which is visual information processing unit of insects is being implemented for understanding visual flow processing on the based of connectome in drosophila. However, it may be accompanied by a vast work to define the values of numerous parameters of the biophysical detailed model. We have been developing a massively parallelized parameter estimator on supercomputer in order to treat massive parameters not by hand but automatically. We succeeded the estimation of more than a few thousand parameters about synapse mechanisms in neural circuit simulation consist of a few of ten neurons. simultaneously by the combination of our massively parallelized simulator on K computer (NEURON K+) and new implementation of mpLMCMA-ES developed from the CMA-ES algorism, a solver based on evolutional algorism. The slow alternating activity of pre-command center of silkmoth could be reproduced by our estimator. We are now constructing detailed antennal lobe model by the combination of the macroscopic and microscopic parameter estimations.
13 Parallel computing of cortio-thalamo-cerebellar circuit using tile partitioning parallelization method on K computer

Jun Igarashi(RIKEN), Hiroshi Yamaura(The University of Electro- Communications), Tadashi Yamazaki(The University of Electro- Communications)

Abstract(Click to expand)

The next generation supercomputers with exaflops levels of computational performance in 2020’s are estimated to be able to perform human-scale whole-brain spiking neural networks. However, it remains unclear how we can efficiently perform communication of increasing spike data among compute nodes and load balancing for the heterogeneous structure of the brain in whole brain simulation. We conducted a feasibility study of efficient parallelization and communication methods of a brain model using K computer with computational performance of 11 petaflops.In the mammalian brain, the cortex and the cerebellum include 99 % of neurons and have layered sheet-like structure. They are densely wired within the regions and sparsely across distant regions. Therefore, an efficient parallel computing of layered sheet-like of spiking neural networks with the dense-neighbor and long-range-sparse distant connections is essential for realizing human-scale whole-brain simulation from the viewpoint of calculation amount. Taking into account the anatomical features of the brain, we considered an application of tile partitioning parallelization, which assigns compute nodes with partitioned tiles of a layered sheet-like neural network. We tested the parallelization method by applying it to a realistic spiking neural network model of the cortico-thalamo-cerebellar circuit using in-house simulator, MONET (Mille-feuille like Organization NEural neTwork). The cortico-thalamo-cerebellar circuit was developed based on anatomical and electrophysiological features of the brain regions. We introduced a method of reducing communication frequency of spike data between distant regions that exploits its long signal transmission delay of long-range connections.We measured calculation time of simulation for 1 second of biological time for various network sizes of cortico-thalamo-cerebellar model and compute nodes. We assigned a cortical tile with 45 thousand neurons, a thalamic tile with 2 thousand neurons and a cerebellar tile with 200 thousand neurons per compute nodes and tested weak scaling performance. The results showed good weak scaling performance for 63 million to 1 billion of neurons of cortico-thalamo-cerebellar model calculated by 768 to 12288 compute nodes. The result suggests that the combination of the tile partitioning parallelization and the reduction method of communication frequency may lead to human-scale whole-brain simulation on next generation exascale supercomputers.
14 Piecewise polynomial approximation algorithm for short-range intermolecular interaction on wide SIMD architectures

Kentaro Nomura(RIKEN Center for Computational Science), Yutaka Maruyama(RIKEN Center for Computational Science), Keigo Nitadori(RIKEN Center for Computational Science), Jun Makino(RIKEN Center for Computational Science)

Abstract(Click to expand)

Molecular dynamics (MD) simulations have become an indispensable tool in material science and biology. To extend the range of applicable target systems, it is important to accelerate MD simulations. In MD simulations, the calculation of two-body intermolecular interaction is the dominant part in the simulation time. The SIMD width of modern high-performance CPUs is becoming wider to increase the peak performance and impprove the power efficiency. For example, both Intel Xeon and Fujitsu A64FX have the SIMD units with the width of 512 bits. Thus, they can perform SIMD operations to either eight double-precision words or 16 single-precision words. Intel and ARM CPU has the instruction to use SIMD registers as table. Here, we present the performance of a piecewise polynomial approximation (PPA) technique we developed to accelerate the calculation of short-range intermolecular interactions. In this method, the SIMD registers are used as a table which hold the coefficients of the piecewise interpolation polynomials. We implemented PPA for the calculation of short-range interaction part of the particle mesh Ewald method. PPA is 10% faster than the best previously known SIMD implementation on Intel Skylake Xeon processor.

Disaster prevention and global climate

15 Data assimilation and forecast experiments for the record-breaking rainfall event in Japan in July 2018 with NICAM-LETKF at 112-km and 28-km resolution

Koji Terasaki(RIKEN, R-CCS), Takemasa Miyoshi(RIKEN, R-CCS)

Abstract(Click to expand)

In July 2018, an active Baiu front caused record-breaking rainfalls and disasters in broad areas in western Japan. This study performs data assimilation and forecast experiments using the NICAM-LETKF system (Terasaki et al. 2015, Terasaki et al. 2017) at 112-km and 28-km resolution with 32 ensemble members. The computational efficiency is essential to run the high-resolution data assimilation cycle. Yashiro et al. (2016) developed a new NICAM-LETKF system to reduce the computational cost of the LETKF part by considering a throughput-aware framework. Data assimilation experiments are performed with conventional observations and advanced microwave sounding unit-A (AMSU-A) satellite radiances for one month starting at 0000 UTC 10 June 2018. Next, forecast experiments are initialized every day at 0000 UTC on 1-5 July 2018.The results show that the 28-km resolution experiment outperforms the 112-km resolution experiment for the location and intensity of the heavy rainfall, although the precipitation amount is significantly underestimated compared with the observation. More than a half of the computational time of the LETKF was occupied by file I/O in the previous NICAM-LETKF system. The new NICAM-LETKF system successfully accelerated by reducing the I/O time to be about 20% of the computational time of the LETKF.
16 Data assimilation experiments with MODIS LAI observations and the dynamic global vegetation model SEIB-DGVM over Siberia

Hazuki Arakida(RIKEN), Shunji Kotsuki(RIKEN), Shigenori Otsuka(RIKEN), Yohei Sawada(Meteorological Research Institute, RIKEN), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

In the previous study, Arakida et al. [2017] developed a data assimilation system based on a particle filter approach with a dynamic global vegetation model known as the SEIB-DGVM, and assimilated the satellite-based MODIS LAI observations successfully. This study extended the previous study to a large domain in Siberia and estimated the state variables including carbon flux, water flux, heat flux, vegetation structure, and parameters related to the phenology of the deciduous needle leaved tree and grass. The initial perturbation of the parameters produced much larger LAI than the observed LAI. DA reduced the LAI greatly by optimizing the parameters and made the estimated LAI very close to the observation. This suggests that the DA system work properly at the large domain. Corresponding to the reduction of LAI, the estimated vegetation functions and structures were also reduced greatly. As a result, most of the estimated variables were highly correlated to the observed LAI and to the estimated values of other previous studies adopting different approaches. This study suggested the potential of the DA system to estimate the vegetation structure as well as vegetation functions in a large area. Arakida et al. (2017), Non-Gaussian data assimilation of satellite-based leaf area index observations with an individual-based dynamic global vegetation model, Nonlinear Proc. Geoph., 24, 553-567.
17 A preliminary analysis of a newly-developed regional ocean data assimilation system: a case of Tokyo Bay in summer

Kohei Takatama(RIKEN), Kenta Kurosawa(RIKEN), Yusuke Uchiyama(Kobe University), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

This study shows preliminary results from a newly-developed regional ocean data assimilation system, ROMS-LETKF, which consists of a regional ocean model ROMS and the local ensemble transform Kalman filter (LETKF). This system was applied for Tokyo Bay at 500-m resolution with 21 ensemble members and forced by an atmospheric analysis data from the Japan Meteorological Agency Mesoscale Model (JMA-MSM). We performed a data assimilation cycle experiment for a month starting at 0000 UTC July 1, 2016 after a 1-month spin-up. We used in-situ observed temperatures and salinities at a 1-hour interval and surface currents observed by high-frequency radars at a 15-minute interval. These observation data are provided from the Tokyo Bay Environmental Information Center.The ROMS-LETKF system works well and improves the reproducibility of the physical fields in Tokyo Bay. However, imbalance of heat between the ocean and the atmosphere causes a cold bias for ocean surface. The bias may be more evident in Tokyo Bay due to its shallow ocean floor and small exchange of water with the outer ocean. These conditions make the prescribed atmospheric forcing have a large effect. Therefore, our future research will focus on atmosphere-ocean coupled data assimilation.
18 Assimilating every 30-second phased array weather radar data in a torrential rainfall event on July 6, 2018 around Kobe city

Yasumitsu Maejima(RIKEN), Shigenori Otsuka(RIKEN), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

To investigate the impact of every 30-second phased array weather radar (PAWR; Yoshikawa et al. 2013, Ushio et al. 2014) observation on a simulation of a severe rainfall event occurred on July 6, 2018 around Kobe city, we perform 30-second-update 100-m-mesh data assimilation (DA) experiments using the Local Ensemble Transform Kalman Filter with the Scalable Computing for Advanced Library and Environment regional numerical weather prediction model. Two experiments were performed: the test experiment with every 30-second PAWR observation (TEST), and the other without observation (NO-DA).The TEST analysis shows intense rainfalls with detailed structure of active convection, better matching with the PAWR observation compared to NO-DA analysis. In the forecast experiment, the forecast initialized by the ensemble mean analysis of TEST is skillful for 20 minutes compared with NO-DA, although the skill is decreased rapidly. The results suggest that the PAWR DA have a potential to improve the numerical simulation for this torrential rainfall event.
19 30-second cycle LETKF assimilation of dual phased array weather radar observations to short-range convective forecasts

James Taylor(RIKEN), Guo-Yuan Lien(RIKEN), Shinsuke Satoh(NICT), Takemasa Miyoshi(RIKEN), Yasumitsu Maejima(RIKEN)

Abstract(Click to expand)

The assimilation of Doppler velocity and reflectivity observations from phased array weather radar (PAWR) has been widely studied for the use of short-range numerical weather prediction (NWP) and has been found to have positive impact to analyses and forecasts (e.g., Maejima et al. 2017). However, these studies assimilated observations from a single PAWR and the use of multiple PAWR observations for NWP has not yet been explored. With the recent development of PAWR at sites in Osaka and Kobe a common observation region exists where we are able to observe convective storms across an area where they can develop very rapidly bringing intense, hazardous rainfall. This study represents the first attempt at assimilating dual PAWR observations for the purpose of improved short-range weather forecasts of a sudden convective rainfall event. We focus on a case that occurred in Osaka on 20 August 2016, which generated heavy rainfall and was well observed by both radars. Simulations are performed with 30-second-cycling of PAWR observations within a high-resolution 100-m mesh using the SCALE-LETKF system (Lien et al. 2017). We aim to develop an effective data assimilation method which fully exploits the availability of having two PAWR systems to observe a single convective rainfall event and show how the data can be optimally combined to improve analyses and short-range forecasts compared to assimilating observations from a single PAWR.
20 Near-real-time SCALE-LETKF forecasts of the record breaking rainfall in Japan in July 2018

Takumi Honda(RIKEN), Guo-Yuan Lien(CWB), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

In July 2018, a stationary precipitation band associated with the Baiu front induced a record breaking rainfall and caused catastrophic destruction in Japan. This event was successfully captured by the near-real-time (NRT) SCALE-LETKF system (Lien et al. 2017) consisting of the Scalable Computing for Advanced Library and Environment-Regional Model (SCALE-RM, Nishizawa et al. 2015; Sato et al. 2015) and the Local Ensemble Transform Kalman Filter (LETKF, Hunt et al. 2007; Miyoshi and Yamane 2007). This system has been continuously operated since 2015 with an 18-km mesh model domain and the ensemble size of 50. In the NRT SCALE-LETKF system, only conventional observations are assimilated every 6 hours. By conducting a series of 50-member ensemble forecasts from the 6-hourly SCALE-LETKF analyses, this study aims to investigate predictability of this torrential rainfall event and important factors that contributed to the heavy precipitation. In general, the NRT SCALE-LETKF system provides skillful ensemble forecasts of the rainfall a few days in advance. Interestingly, the forecast skill exhibits sudden improvement due to assimilating conventional observations at a single location far southwest from the peak accumulated rainfall location. Forecast differences and ensemble correlations suggest that an extratropical cyclone over the Sea of Japan and a low-level trough near Taiwan play important roles in determining the front location.
21 Assimilating fractions of precipitation area: an idealized study with an intermediate AGCM

Shigenori Otsuka(RIKEN), Taeka Awazu(KS Solutions), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

Assimilating precipitation observations into an atmospheric model is a challenging issue due to nonlinear processes and non-Gaussian PDF of precipitation. Previous studies adopted variable transformation techniques such as the logarithmic transformation and the Gaussian transformation. Here we propose an alternative approach: assimilating fractions of precipitation area based on the precipitation map from remote-sensing techniques such as radars and satellites. Within n-by-n grid points centered at the analysis point, the number of grid points exceeding a given threshold of precipitation rate is assumed as the "observation." Applying this observation model to the observed and simulated precipitation fields, we perform data assimilation. In this presentation, idealized experiments with an intermediate atmospheric general circulation model known as the SPEEDY and the local ensemble transform Kalman filter (LETKF) will be presented.
22 NICAM-Chem : Atmospheric chemical transport model based on the non-hydrostatic icosahedral atmospheric model

Yousuke Yamashita(Japan Agency for Marine-Earth Science and Technology), Masayuki Takigawa(Japan Agency for Marine-Earth Science and Technology), Daisuke Goto(National Institute for Environmental Studies), Hisashi Yashiro(RIKEN), Kentaroh Suzuki(The University of Tokyo), Masaki Satoh(The University of Tokyo), Yugo Kanaya(Japan Agency for Marine-Earth Science and Technology), Fumikazu Taketani(Japan Agency for Marine-Earth Science and Technology), Takuma Miyakawa(Japan Agency for Marine-Earth Science and Technology)

Abstract(Click to expand)

The Non-hydrostatic ICosahedral Atmospheric Model (NICAM)-Chem is developed and used in this study. The NICAM has been developed to perform cloud-resolving simulations and has been used for global simulations of cloud and precipitation (Tomita and Satoh 2004; Satoh et al. 2008, 2014). The chemistry module is coupled with the NICAM to be included impact of atmospheric aerosol and chemistry (Takemura et al. 2000; 2002; 2009; Suzuki et al. 2008; Goto et al. 2011). The latest horizontal resolution of NICAM-Chem is 56 km. This model was developed to include the forest fire emission with daily time-scales. One noticeable advantage was achieved by replacing the model’s injection height of forest fire events by the observational injection height using CAMS Global Fire Assimilation System (GFAS) dataset, while the emission scheme of previous model used constant injection height about 3 km. The NICAM-Chem simulations are performed with K computer to reproduce the aerosol transport after the large and continuous forest fire emission occurred around Lake Baikal of Siberia in September 2016. We successfully reproduce the maximum of carbon concentration in 25–26 September around Aleutian Islands, in agreements with the observation in Arctic cruise of R/V Mirai (MR16-06). To estimate anthropogenic BC impact, the simulation without fire emission was performed. Without fire emission, the BC concentration in 25–26 September is much smaller than observation, indicating small impact of anthropogenic BC on this event. Results also indicate the transport pathway and the high BC concentration area moves from Siberia (9/20-22) to Aleutian Islands (9/25-27) through Northeast China (9/23) and the Kamchatka (9/24). In 24–27, the low-pressure system is developed in Kamchatka and Aleutian and the wet deposion of BC is calculated, especially in the area behind the cold front of low-pressure system. These results indicate that the NICAM-Chem is capable of simulating fine scales transport processes of BC.
23 Can hydrological observations improve global NWP in land-atmosphere-coupled data assimilation?

Kenta Kurosawa(RIKEN), Shunji Kotsuki(RIKEN), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

High Performance Computation (HPC) is a crucial infrastructure for developing numerical weather prediction (NWP) systems. This study aims to improve global weather forecasts by assimilating hydrological observations, which have not been used in typical NWP systems. For that purpose, we developed a land-atmosphere-coupled data assimilation system by extending the global atmospheric data assimilation system composed of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) and Local Ensemble Transform Kalman Filter (LETKF). The NICAM incorporates the Minimal Advanced Treatments of Surface Interaction and RunOff (MATSIRO) as the land surface model. The new system can update state variables of the MATSHIRO by assimilating hydrological observations.Satellite instruments can measure several hydrological parameters such as soil moisture, surface skin temperature and snow amount. To avoid complexity of quality control of such satellite observation data, this study assimilates hydrological parameters obtained from global land data assimilation systems (GLDAS) as the first step. Our preliminary experiments show proper, stable performance; assimilating soil moisture data successfully reduces errors in those variables relative to the GLDAS. This poster presents the most recent results by the time of the symposium.
24 Model Parameter Estimation with Data Assimilation using NICAM-LETKF

Shunji Kotsuki(RIKEN Advanced Institute for Computational Science), Yousuke Sato(Nagoya University), Koji Terasaki(RIKEN Advanced Institute for Computational Science), Hisashi Yashiro(RIKEN Advanced Institute for Computational Science), Hirofumi Tomita(RIKEN Advanced Institute for Computational Science), Masaki Satoh(RIKEN Advanced Institute for Computational Science), Takemasa Miyoshi(RIKEN Advanced Institute for Computational Science)

Abstract(Click to expand)

Estimating optimal parameters of a simulation model is a widely explored research in the HPC community, where machine learning techniques are often used. This study applies data assimilation to optimize a parameter of a numerical weather prediction (NWP) model. Kotsuki et al. (2018a, JGR) succeeded in improving global precipitation forecasts at 112-km-resolution NICAM (Nonhydrostatic ICosahedral Atmospheric Model) by estimating a parameter called B1 of Berry (1967)’s large-scale condensation scheme using satellite-observed precipitation data and the Local Ensemble Transform Kalman Filter (LETKF). Extending the previous study, this study explores to improve the forecasts further using other satellite observations. This study estimates the parameter B1 as a global-constant parameter with cloud liquid water (CLW) data observed by GCOM-W/AMSR2. The parameter estimation successfully reduces excessive bias in CLW although precipitation forecasts are degraded. In addition, this study extends to estimate spatial distributions of the B1 parameter. The spatially-varying B1 parameter shows the best agreement to the spatial pattern of observed LWP. This presentation will include the most recent progress up to the time of the symposium.
25 A 0.56°-resolution global data assimilation of multi-species satellite measurements for atmospheric composition study

Takashi Sekiya(JAMSTEC), Kazuyuki Miyazaki(JAMSTEC/JPL-CalTech), Koji Ogochi(JAMSTEC), Kengo Sudo(Nagoya University/JAMSTEC), Masayuki Takigawa(JAMSTEC), Henk Eskes(KNMI), Folkert Boersma(KNMI)

Abstract(Click to expand)

Information on atmospheric composition such as tropospheric ozone and its precursors are important for human health, ecosystem, and climate studies. The combined use of satellite measurements of ozone and its precursors has great potential to provide comprehensive information on the atmospheric composition and chemistry on regional to global scales. Previous studies have demonstrated the capability of advanced multi-species satellite data assimilation to simultaneously optimize the chemical concentrations of various species and emissions of several ozone precursors at relatively low resolutions (at 1°‒4° resolutions). In this study, we conducted a high-resolution global data assimilation (at 0.56° resolution) using an ensemble Kalman filter approach (Miyazaki et al., 2015) and high-resolution global chemical transport model (Sekiya et al., 2018). The computational efficiency of high-resolution data assimilation system on the Earth Simulator system was improved by tuning data assimilation code (e.g., vectorization) with increasing computational file input/output (I/O) efficiency (totally, 9 time faster than our original system). We assimilated multi-species retrievals (ozone, NO2, CO, HNO3, and SO2) from multiple satellite sensors (OMI, GOME-2, SCIAMACHY, TES, MOPITT, and MLS) during July 2008. The 0.56°-resolution data assimilation reduced the root mean square error (RMSE) of tropospheric NO2 column at the 46 mega cities over the globe against OMI by 16% compared to 1.1° resolution. The RMSE of surface NO2 at large cities against world-wide monitoring networks (AirBase, AQS, and Asian networks) was also reduced by 20% at 0.56° resolution compared to 1.1° resolution. As compared to surface NOx emission estimation at 1.1° resolution, the estimated emission over the polluted areas were lower by 5-15% at 0.56° resolution, primarily because of resolution-dependent model biases of tropospheric NO2 associated with non-linear transport and chemistry processes at small scale. These results suggest the potential of using the 0.56°-resolution data assimilation system for making better use of more advanced high-resolution satellite retrievals provided by low Earth orbit and geostationary satellites, which will benefit atmospheric composition studies on various spatial scales.
26 Optimizing Hydroelectric Dam Operations with Machine Learning

Marimo Ohhigashi(RIKEN), Shunji Kotsuki(RIKEN), Shohei Takino(Tokyo Electric Power Company Holdings), Takemasa Miyoshi(RIKEN)

Abstract(Click to expand)

Hydroelectric power generation, which converts the potential energy of water to electric energy by discharging stored water through generators, is an important renewable energy resource. To avoid overflows, it is necessary to release water in advance when flood is predicted due to heavy rains. Better weather prediction would improve prediction capabilities of river inflows and subsequent dam operations, so that we can reduce unnecessary water release for more efficient hydroelectric power generation.This study aims to achieve more efficient dam operations using state-of-the-art machine learning techniques. For that purpose, we develop successive three machines as follows. The first machine improves precipitation forecasts by numerical weather prediction, extending the traditional model output statistics (MOS) methods. The second machine emulates a river runoff model and predicts the river inflow using the precipitation forecasts from the first machine. In the first and second machines, we apply supervised learning using observed radar and river inflow data as the references, respectively. Using the reinforcement learning technique, the third machine aims to maximize the total power generation following operational restrictions for safety. The predicted river inflow by the second machine is used in the third machine as input data. We develop these three machines for a certain river catchment. This poster presents our ongoing work on developing machine-learning-based dam operations.
27 Parallel implementation of dynamic traffic assignment by asynchronous iteration process

Takamasa Iryo(Kobe University), Kazuki Fukuda(Kobe University), Junji Urata(The University of Tokyo), Genaro Jr. Peque(Kobe University), Lalith Wijerathne(The University of Tokyo), Wasuwat Petprakob(The University of Tokyo)

Abstract(Click to expand)

A road network is likely to be severely congested after a major earthquake. Damages on infrastructure and buildings significantly reduces capacity of a road network. The deterioration of a road transport system will delay a lot of travels including essential ones such as rescues of victims and interfere a recovery from a disaster. Assessing congestion of a road network in a recovery process from a disaster is essential to maintain disaster resilience. Calculating a traffic flow by a computer is a popular approach to assess road congestion. Considering the time-dependent property of a traffic flow and its congestion in an urban road network, it is natural to employ the dynamic traffic assignment (DTA) technique to evaluate congestion and queues on a road network. A number of studies and commercial software packages implement parallelism for DTA, while typical implementations are subject to frequent communications between CPUs. For the calculation of dynamic flow propagations, they basically divide a DTA problem into several sub-networks so that each core can calculate flow propagations of the links in each sub-network in parallel. However, it is apparent that the flow propagations between sub-networks continuously occur during the calculation, forcing the CPUs to communicate frequently (e.g. every second in a study duration, which can be several hours or more). Communicating data between cores incur long latency time when the transfer is set up. Hence, it is necessary to transfer a larger chunk of data at lesser frequency to communicate in a reasonable speed. This study employs the asynchronous iterative approach to reduce the frequency of communications. In the algorithm, a traffic flow is regarded as a continuous flow like a fluid flow. Dynamics of a traffic flow is described by a partial differential equation system called LWR (Lighthill-Whitham-Richards) model. To parallelise the calculation, a road network is divided into a number of sub-networks and they are assigned to each CPU. A traffic flow pattern of each link is independently calculated by the LWR model for the entire study duration (e.g. 12 hours). Then, inconsistency of traffic flow patterns between links is corrected by transmitting information of traffic flow patterns for the entire study duration at once between links. This process is repeated until the inconsistency vanishes.The proposed algorithm has been implemented on K. The presentation will include details of the algorithm and performance measures such as scalability in a test network consisting of 0.4 million links.
28 Link travel time approximation for large-scale dynamic traffic simulations

Genaro Jr Peque(Kobe University), Hiro Harada(Kobe University), Takamasa Iryo(Kobe University)

Abstract(Click to expand)

Large-scale dynamic traffic simulations are becoming widespread partly due to the exponential growth of the computer's processing power, memory capacity and parallelization capability. Along with it is the increasing need to manage the sizeable amount of raw data it generates. Typically, big data reduction techniques are used to decrease redundant, inconsistent and noisy data as it is perceived to be more useful than the raw data itself. However, these methods are normally performed independently so it wouldn’t compete with the simulation’s computing and memory resources. A major challenge is when it is integrated into the simulation process and executed numerous times since it needs to be simple, fast and efficient both in space and time.In this study, we are interested in reducing the size of the link travel time data for route planning during a dynamic traffic simulation. The simulation is implemented in a parallel computer with a distributed memory architecture. Since travel time is a function of a single variable that is strictly monotone (time), the problem can be defined as a piecewise linear approximation of the link travel time data. There are many existing algorithms developed for this problem but most are interested in retaining the data’s significant features. Moreover, the space and time complexities of these algorithms are usually dependent on the input data complexity such as the number of sampled data points, sampled data’s shape and irregularity of sampling intervals.Given that the travel time for route planning is retrieved along the y-axis, we propose a piecewise linear approximation algorithm that focuses only on minimizing the error along the y-axis. More specifically, the idea is to use linear interpolation to minimize the error of the real and approximated link travel time by minimizing their Euclidean distance along the y-axis where travel time values are retrieved by each driver for route planning. Since linear interpolation calculates points between any of the given input points, the estimated link travel time values are assured to be bounded by these points. An important aspect of the algorithm is its capability to restrict the maximum absolute error bound which avoids theoretically inconsistent results not accounted for by the dynamic traffic simulation model. Additionally, it has a low space and time complexity which is essential for a parallel computer with distributed memory architecture. Finally, using a 10x10 grid network with variable link travel time data complexities and absolute error bounds, the dynamic traffic simulation results show that the algorithm achieves around 80%−99% of link travel time data reduction using a small amount of computational resource.
29 Evaluation procedure for uncertainty source due to GCM projections in downscaled regional climate

Sachiho Adachi(RIKEN), Seiya Nishizawa(RIKEN), Kazuto Ando(RIKEN), Tsuyoshi Yamaura(RIKEN), Ryuji Yoshida(NOAA Earth System Research Laboratory (ESRL)), Hisashi Yashiro(RIKEN), Yoshiyuki Kajikawa(RIKEN), Hirofumi Tomita(RIKEN)

Abstract(Click to expand)

Regional climate model (RCM) is a necessary tool for not only projecting future regional climate, but also evaluating a mechanism causing the regional climate change. The dynamical downscaling method projects future regional climate using an RCM with boundary condition provided from a general circulation model (GCM). In general, the downscaled regional climate projections include uncertainties due to differences in emission scenario, GCM projections as boundary condition of RCM, and RCM itself used for downscaling. This study focuses on uncertainty due to GCM projections. The large-scale atmospheric state provided from a GCM can be divided into two components: mean state and perturbations. The regional climate is affected by changes in both components. The changes in large-scale mean state correspond to changes in atmospheric structure and thermodynamic effect associated with enriched atmospheric water vapor. On the other hand, the changes in perturbations roughly correspond to changes in properties of weather disturbances including the Baiu front and tropical and extratropical cyclones. Adachi et al. (2017) proposed a new procedure. It allows us to quantitatively evaluate the influences of changes in large-scale mean state and perturbation on the regional climate change. In addition, it can evaluate the influence of interaction between mean state and perturbation on the regional climate. Namely, the procedure divides factors causing regional climate change into three: changes in large-scale mean state, changes in perturbation, and the resulting nonlinearity between them. They used the procedure to evaluate a mechanism causing the regional climate change, whereas this study demonstrates the applicability of the procedure to evaluating property of uncertainties due to GCM projections in regional climate projection. The procedure will show which component, mean state or perturbation, makes spread of regional climate projections become large. The result help us to understand property of uncertainty in future regional climates projected by multi-GCM projections.
30 Factor analysis in downscaled regional climate change

Yoshiyuki Kajikawa(RIKEN), Kazuto Ando(RIKEN), Sachiho Adachi(RIKEN), Seiya Nishizawa(RIKEN), Tsuyoshi Yamaura(RIKEN)

Abstract(Click to expand)

It is worthwhile and rather necessary to estimate the future climate change in regional scale for urban design and disaster prevention and mitigation. Especially, future precipitation change of amount, intensity, and frequency is highly concerned. The dynamical downscaling by using regional climate model (RCM) is one of the effective methods to estimate and evaluate the future climate change, and it has been used in many previous studies. However, the comprehensive understanding the cause of future precipitation change remains as an issue. Specifically, it is important to estimate the effect of synoptic disturbance, which we extracted Tropical cyclone (TC), mid-latitude Low and Baiu Front in this study, for the future precipitation change and the other precipitation change unaffected by the synoptic disturbance. It is necessary to examine how much difference in above estimation between General circulation model (GCM) results and downscaled results by RCM. Here, we defined the precipitation affected by TC, mid-latitude Low and Baiu and applied them to GCM and RCM results in the area over the western Japan as a demonstration. The downscaled climate simulation was conducted by SCALE-RM on K-Computer. For the future precipitation change, we found the large contribution of precipitation associated with the synoptic disturbance in the GCM result, while it is not negligible to consider the precipitation change unaffected by the synoptic disturbance in the downscaled results. This implies we should consider the local precipitation more for the future change. The detailed phenomenon providing that precipitation will be discussed and highlighted.
31 A "household budget" method for the optimization of huge legacy code: a case of weather application

Hisashi Yashiro(RIKEN Center for Computational Science), Hirofumi Tomita(RIKEN Center for Computational Science)

Abstract(Click to expand)

An application code with the long history is often used for production-runs in each scientific domain. Many features are integrated into those legacy codes by numerous developers. It is difficult to grasp the total computational performance of them because many sections to deteriorate performance are hidden in the codes. Especially in the weather and climate domain, there are no hotspots in the application codes, and low-performance sections are scattered everywhere in the code. We call such performance statistics "flat-profile." The cost-ranking method, which is usually used for the performance analysis, is not sufficient to reduce the little time-wasting lines.We introduce the method and strategy to optimize the huge, non-hotspot, memory-bounded codes. We collect information such as FLOPS, memory throughput, and elapse time of all the sections passing at the time of production-run. We subdivide the measurement sections as needed. An index "seconds per FLOPS" is useful to detect the low-performance sections. This method has been applied to the optimization of the global climate model NICAM (Nonhydrostatic ICosahedral Atmospheric Model) on the K computer and Fujitsu FX100. We show the result of the performance enhancement and introduce some cases of the source code caused the performance deterioration.
32 Tsukuba tornado with Fujita scale 3 reproduced by super-computer 'K'

Hiromu Seko(Meteorological Research Institute, Japan Agency for Marine-Earth Science and Technology), Wataru Mashiko(Meteorological Research Institute), Sho Yokota(Meteorological Research Institute), Tetsurou Tamura(Tokyo Institute of Technology), Hiroshi Niino(The University of Tokyo)

Abstract(Click to expand)

Severe weather phenomena that cause disasters, such as heavy rainfalls and tornadoes, occurred year after year in Japan. To reduce victims of local heavy rainfalls or wind gusts, their accurate forecasts including occurrence probabilities are needed. It is also important to clarify the impact of wind gusts on living space near the surface. The group of Post-K Priority Issue 4 is developing the forecast model and the data assimilation system to forecast the heavy rainfall and tornadoes more accurately. We developed the NHM-LETKF data assimilation system for the Polarimetric radars and for the dense surface observation network, in addition to the conventional observations of JMA, and applied them to the F3 tornado that occurred on 6th May 2012 near Tsukuba City. As a result, the tornado was successfully reproduced. The ensemble forecasts with the grid interval of 50 m using the super-computer 'K' showed important factors that determine the tornadogenesis, besides the probability distribution of the tornadogenesis: In this case, the water vapor of low-level inflow and the meso-cyclone at the height of 1 km largely affected the tornadogenesis. Moreover, the super high-resolution simulation with a horizontal grid spacing of 10 m reproduced a multiple-vortex structure of the tornado, which is characterized by several subvortices accompanied with strong horizontal convergence and strong updraft near the surface. These distributions reproduced by the weather forecast model were further used as the outer model fields of the LES, which can express the building as well as eddies. The wind and pressure fields around the buildings reproduced by the LES showed the influence of the tornado while passing near the buildings. These are the first results in these study fields that can be obtained only by super computer 'K'.

Energy issues

33 Development of Exascale Fusion Plasma Turbulence Simulations for Post-K

Yasuhiro Idomura(Japan Atomic Energy Agency), Takuya Ina(Japan Atomic Energy Agency), Kevin Obrejan(Japan Atomic Energy Agency), Yuuichi Asahi(National Institutes for Quantum and Radiological Science and Technology), Seikichi Matsuoka(National Institute for Fusion Science), Toshiyuki Imamura(Riken)

Abstract(Click to expand)

Turbulent transport is one of key issues in fusion science. To address this issue via a five dimensional (5D) gyrokinetic model, the Gyrokinetic Toroidal 5D full-f Eulerian code GT5D [Idomura et al., Nucl. Fusion (2009)] has been developed. On the K-computer, inter-node parallelization techniques such as multi-dimensional/-layer domain decomposition and communication-computation overlap were developed, and strong scaling of GT5D was improved up to 73,728 nodes [Idomura et al., Int. J. HPC Appl. 2014, J. Comput. Phys. 2016]. The computing power enabled us to study ITER relevant issues such as the plasma size scaling of turbulent transport and transient plasma responses induced by turbulence transition. However, extensions of GT5D towards burning plasmas including kinetic electrons and multi-species ions require exascale computing. Under the post-K project, we have developed computing techniques for the next generation computing platforms based on many core processors. In this talk, we discuss computational challenges related to complicated intra-processor memory hierarchy and limited inter-node communication performance compared with accelerated computation. The former issue is addressed by optimizing data access patterns of a stencil kernel on each many core architecture, and high performance gains are obtained on several many core architectures [Asahi et al., IEEE-TPDS 2017]. The latter issue is resolved by using advanced communication avoiding Krylov methods, which enables an order of magnitude reduction of collective communications and improves arithmetic intensity of main computing kernels [Idomura et al., ScalA17@SC17]. By applying these novel computing techniques, the performance of GT5D is dramatically improved on the latest many core platforms, and excellent strong scaling up to the full system size of the Oakforest-PACS (8,192 KNLs) is achieved.
34 Communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations

Susumu Yamada(Japan Atomic energy Agency), Naoyuki Onodera(Japan Atomic energy Agency), Takuya Ina(Japan Atomic energy Agency), Susumu Yamashita(Japan Atomic energy Agency), Yasuhiro Idomura(Japan Atomic energy Agency), Toshiyuki Imamura(RIKEN)

Abstract(Click to expand)

In order to analyze severe accidents in nuclear power plants and to estimate the resulting debris properties, Japan Atomic Energy Agency promotes the development of three dimensional multi-phase multi-component thermal hydraulic CFD code JUPITER [Yamashita et al, Nucl. Eng. Des. 2017]. JUPITER computes thermal-hydraulics of the molten material in nuclear reactors by the equations of continuity, Navier-Stokes, and energy, assuming Newtonian and incompressible viscous fluids. The dynamics of gas, liquid, and solid phases of reactor internal components are described by the volume of fluid function. The main computational cost comes from the pressure Poisson equation, which is ill-conditioned because of the extreme density contrast between gas and solid phases and the multi-scale dynamics of complicated flows and interfaces. In order to solve this kind of ill conditioned multi-scale problem, we have developed communication avoiding (CA) matrix solvers for future exascale systems such as the Post-K computer, where a communication bottleneck may appear because of accelerated computation. In this work, we discuss comparisons of the conventional conjugate gradient (CG) method, the CACG method [Mayumi et al., ScalA16@SC16], the Chebyshev basis CACG (CBCG) method [Idomura et al., LNCS 2018], and the CA multigrid (CAMG) method [Idomura et al., ScalA18@SC18]. CAMG has robust convergence properties regardless of the problem size, and shows both communication reduction and convergence improvement, leading to higher performance gain than CACG and CBCG, which achieve only the former. Extreme scale multiphase CFD simulations at 90 billion DOFs show that compared with CG, MGCG reduces the number of iterations to 1/800, and achieves 11.6x speedup with keeping excellent strong scaling up to the full system size of the Oakforest-PACS (8,000 KNLs).
35 Optimization of plasma electromagnetic particle simulation code PASMO

Hiroaki Ohtani(National Institute for Fusion Science), Yohei Miyake(Kobe University), Hiroshi Nakashima(Kyoto University), Ritoku Horiuchi(National Institute for Fusion Science), Shunsuke Usami(National Institute for Fusion Science)

Abstract(Click to expand)

In order to investigate plasma phenomena from the microscopic viewpoint, particle simulation is one of the useful methods. In the particle-in-cell (PIC) method [1], the dynamics of particles and electromagnetic fields are coupled, and the equation of motion and the Maxwell equations are solved selfconsistently. For performing the PIC simulation code on a distributed memory and multi-processor computer system with a distributed parallel algorithm, the simulation domain is decomposed simply and three-dimensionally under the distributed parallel processing by MPI, and distributed the information of particles under thread parallel processing by OpenMP and automatic parallelization. There are several difficulties to perform the large-scale PIC simulation [2]: The global calculation such as Poisson solver with FFT is needed, the load unbalance takes place in association with the nonuniform particle distribution, and the data in a memory is accessed randomly in the calculations of the electromagnetic filed on the particles and the current density from the position and velocity of the particles. In order to solve these problems, we adapted the charge conservation scheme [3], and introduced a dynamic load balancing library OhHelp [Nakashima] to our PIC simulation code PASMO. Recently, we are developing the algorithm which can reduce the random memory access. In this paper, we report the recent development and the performance of the optimized PASMO code on Fujitsu FX100.AcknowledgementsThis work was partially supported by the National Institute for Fusion Science (NIFS) Collaborative Research Program (NIFS17KNTS046, NIFS18KNSS102, and NIFS17KNXN335) and “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” in Japan.Reference[1] C. K. Birdsall and A. B. Langdon, “Plasma Physics via Computer Simulation”, McGraw-Hill, New York, 1985.[2] Y. Miyake and H. Nakashima, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, (2013), 1107, DOI: 10.1109/TrustCom.2013.134.[3] T. Zh. Esirkepov: Computer Physics Communications, Vol.135, pp. 144-153 (2001).[4] H. Nakashima, Y. Miyake, H. Usui, and Y. Omura: Proc. Intl. Conf. Supercomputing, pp.90-99 (2009).
36 Extended kinetic-magnetohydrodynamic hybrid simulations of magnetically confined laboratory plasmas

Yasushi Todo(National Institute for Fusion Science), Masahiko Sato(National Institute for Fusion Science), Hao Wang(National Institute for Fusion Science), Ryosuke Seki(National Institute for Fusion Science)

Abstract(Click to expand)

Magnetohydrodynamics (MHD) is a theoretical framework that explains well the macroscopic behavior of plasmas. However, MHD is an unfinished framework for magnetically confined laboratory plasmas in tokamaks and helical devices, because the MHD pressure equation assumes sufficiently high collision frequency which is not valid for the high-temperature plasmas. One typical example that needs an extension of MHD is energetic-particle driven instabilities. Kinetic-MHD hybrid simulations for energetic particles interacting with an MHD fluid are useful tools to understand and predict energetic particle driven MHD instabilities. In the hybrid simulation models (e.g. [1]), the bulk plasma is described as an MHD fluid where kinetic effects of thermal ions and electrons are neglected. In this paper, we present two types of extended kinetic-MHD hybrid simulations which have been developed on the K computer towards the applications on the post-K computer. The first type is an extended kinetic-MHD hybrid simulation where thermal ion kinetic effects are taken into account in addition to those of energetic particles [2]. The gyrokinetic particle-in-cell simulation method is applied to both thermal ions and energetic particles. The second type is the “multi-phase hybrid simulation” in a longer time than the instability time scales [3]. The time step width of the kinetic-MHD hybrid simulation is limited by Courant condition for MHD fast wave. In the multi-phase hybrid simulation, we run alternately the hybrid simulation and the classical simulation where MHD perturbations are turned off. This enables us to simulate the energetic particle driven instabilities in the collisional time scales. It has been known that the LHD plasmas are more stable for pressure driven MHD instabilities than the predictions of MHD theory. We have applied the first type of simulation to the ballooning instabilities in LHD to investigate the kinetic effects of thermal ions. It is demonstrated that the kinetic response of thermal ions to the MHD perturbations is significanlty weaker than the adiabatic fluid response leading to the stabilization of the instability. The second type of simulation has been applied to the synchronized sudden growth of multiple AEs, which we call AE burst. This simulation successfully reproduced the AE bursts in the experiment. We have analyzed the energetic particle distribution function in phase space and clarified the physical mechanism of AE burst. [1] Y. Todo and T. Sato, Phys. Plasmas 5 (1998) 1321. [2] Y. Todo et al., in the 26th International Toki Conference (Dec. 5-8, 2017, Toki, Japan). [3] Y. Todo et al., Nucl. Fusion 54 (2014) 104012.
37 Communication reduced multi-time-step algorithm for the AMR-based lattice Boltzmann method on GPU-rich supercomputers

Naoyuki Onodera(Japan Atomic Energy Agency), Yasuhiro Idomura(Japan Atomic Energy Agency), Yussuf Ali(Japan Atomic Energy Agency), Takashi Shimokawabe(The University of Tokyo)

Abstract(Click to expand)

It is very important for nuclear security to predict environmental dynamics of radioactive substances considering the effect of buildings. In order to understand the details of the plume dispersion there, it is necessary to carry out real-time and large-scale Computational Fluid Dynamics (CFD) simulations. However, it is difficult to perform multi-scale CFD simulations with a uniform grid from the viewpoint of computational resources and calculation time. To resolve this issue, we developed an adaptive mesh refinement (AMR) method for the lattice Boltzmann method (LBM). Although the AMR method is effective for multi-scale simulations, it requires additional halo data communications at interfaces of regions with different grid resolutions. This leads to a severe communication bottleneck on GPU platforms and future exascale systems such as the Post-K computer.We have developed a communication reduced multi-time-step (CRMT) algorithm for a LBM based on a block-structured AMR. This algorithm is based on the temporal blocking method, and can improve computational efficiency by replacing a communication bottleneck with additional computation. The proposed method is easily applied to the explicit time integration scheme, and is implemented on an extreme scale airflow simulation code CityLBM. We evaluate the performance of the CRMT algorithm on GPU based supercomputers. Thanks to the CRMT algorithm, the communication cost is reduced by ∼ 64%, and weak and strong scaling are improved up to ∼ 200 GPUs on TSUBAME 3.0. The obtained performance indicates that real time airflow simulations for about 2km square area with the wind speed of ∼ 5m/s is feasible using 1m resolution. We conclude that the CRMT algorithm is indispensable for the AMR-LBM to realize a real time simulation on future exascale systems.
38 Alpha-cluster structure from no-core Monte Carlo shell

Takashi Abe(CNS, the University of Tokyo)

Abstract(Click to expand)

Owing to recent computational and methodological developments, the ab-initio approaches in nuclear physics have been widely developed. The Monte Carlo shell model is applied to ab-initio approach. The no-core Monte Carlo shell model is currently capable to calculate physical observables up to lower sd-shell region. As one of physics applications, the alpha-cluster structure in light nuclei is focused on from an ab-initio point of view.

Industrial competitiveness

39 Theoretical investigation of “Fullerene Like” Bowl Shaped Polyarenes Dy(III) Model Complexes

Tanu Sharma(Indian Institute of Technology Bombay, India), Mukesh Singh(Indian Institute of Technology Bombay, India), Radhika Gupta(Indian Institute of Technology Bombay, India), Goapalan Rajaraman(Indian Institute of Technology Bombay, India)

Abstract(Click to expand)

Long, Mills and Layfield et al have emphasized the importance of proper ligand environment around a lanthanide ion for obtaining record blocking temperature and creating a potentially applicable single molecular magnets (SMMs).[1] Our previous theoretical studies on Dy2@C79N and {DyOSc/DyODy}@C72/76/82 endohedral metallofullerenes (EMFs) suggest fullerene as a suitable candidate for providing strong axial ligand field environment around Dy(III) ion. One can attain as high as ~1500 cm-1 Ueff value.[2] Besides, high-symmetry, no-nuclear spin atoms such as H which is very common in coordination complexes, high-rigidity due to strong carbon-carbon bond make them more suitable to attain high blocking barrier (TB). Considering a fact that synthesis of these EMFs is a challenging task, we searched for an alternative “fullerene-like” bowl-shaped traditional coordination ligand and came across corannulene ligand system.Combined DFT and ab initio studies on several Dy-Corannulene based models suggest a possibility to attain Ucal value as high as ~800 cm-1 via 4th excited state. Atoms in a molecule (AIM), natural bonding orbital (NBO) together with molecular dynamics (MD) have been performed to explore all the structural and magnetic behaviour for all models. References [1]. (a) Goodwin, C. A. P.; Ortu, F.; Reta, D.; Chilton, N. F.; Mills, D. P., Nature 2017, 548, 439; (b) Guo, F.-S.; Day, B. M.; Chen, Y.-C.; Tong, M.-L.; Mansikkamäki, A.; Layfield, R. A., Science 2018 (DOI: 10.1126/science.aav0652); (c) Rinehart, J. D.; Long, J. R., Chem. Sci. 2011, 2, 2078.[2]. (a) Singh, M. K.; Rajaraman, G., Chem. Commun. 2016, 52, 14047; (b) Singh, M. K.; Yadav, N.; Rajaraman, G., Chem. Commun. 2015, 51, 17732.
40 Development of efficient calculation method of long-polymer chain’s chemical potential

Kazuo Yamada(Osaka University), Nobuyuki Matubayasi(Osaka University)

Abstract(Click to expand)

The polymer blends are widely used in industrial applications including home appliances, biomedical devices, and automotive products. They are commonly made by blending several different polymers together, and the properties of polymer blends are determined by the mixing free energy of blended polymers. The mixing free energy of blend is determined by the chemical potential changes of the species to be mixed, and its evaluation is of practical use to address the possibility of preparing such material as polymer blend. An atomistic computation of free energy is a challenging task, however, it is hard to compute straightforwardly it. This is because a polymer is highly flexible and structurally diverse due to a large number of intramolecular degrees of freedom. Therefore, an elaborate scheme needs to be established which exploits a specific nature of the polymeric structure. The method of chain increment was proposed by Kumar et.al. to compute the chemical potential of a polymer described by Lennard-Jones chain. It relies upon the structural feature that a monomer unit is repeated many times, and performs a free-energy calculation for introducing the interaction of each repeated unit. Inspired by their study, we develop a chain-increment method for computing the chemical potential of a polymer with all-atom model. The intermolecular interaction of a polymer of interest with the surrounding molecules is progressively introduced for repeated unit of monomer, and the free energy for turning on the interaction is treated within the framework of a solvation theory. In our computational scheme using solvation theory, the “solute” refers to the monomer unit to be incremented along the polymer chain and the “solvent” is set to the other polymer molecules. The “solvation” is then to introduce the interaction between the “solute” and “solvent”, and the “solvation free energy” is the incremental change in the (excess) chemical potential of the polymer. We employ the method of energy representation since it is necessary that the solvation theory be suitable for treating flexible molecules since polymer is a highly flexible species. We examine the melts of polyethylene, poly(methyl methacrylate), and polystyrene using all-atom molecular dynamics simulation and compute their excess chemical potentials as functions of the chain length. The free energy of chain increment stays constant beyond the degree of polymerization of 50. To assess the accuracy of the energy-representation method, we also calculated excess chemical potential using rigorous method of Widom insertion.
41 First-principles study of rare-earth magnet compounds

Takashi Miyake(National Institute of Adcanced Industrial Science and Technology)

Abstract(Click to expand)

Computational materials discovery is attracting interest recently. We report computational screening of rare-earth magnet compounds by first-principles calculation with the help of machine learning. The basic idea is to carry out high-throughput first-principles calculation of hypothetical compounds having various crystal structures and chemical compositions. To accelerate the screening, we construct a machine-learning model using kernel-ridge regression, which enables us to estimate materials properties efficiently. We use Orbital Field Matrix (OFM) [1] as a descriptor. In OFM, a crystal is divided by Voronoi polyhedra, and its local structure is expressed by a matrix using the information of electron configurations of constituent elements. Application to thousands of transition-metal compounds reveals that kernel-ridge regression with OFM reproduces the formation energy and local magnetic moments with high accuracy. Virtual screening of Nd-Fe-B systems using this technique will be presented. We also report subgroup relevance analysis of experimental Curie temperatures of rare-earth transition-metal bimetals [2].[1] Tien-Lam Pham et al., Sci. Technol. Adv. Mater. 18, 756 (2017); J. Chem. Phys. 148, 204106 (2018).[2] Hieu-Chi Dam et al., J. Phys. Soc. Jpn. 87, 113801 (2018).
42 First-Principles Local-Energy and Local-Stress Schemes to Understand and Design Micro-Structures of Structural Materials

Masanori Kohyama(National Institute of Advanced Industrial Science and Technology), Zhuo Xu(National Institute of Advanced Industrial Science and Technology), Shingo Tanaka(National Institute of Advanced Industrial Science and Technology)

Abstract(Click to expand)

The strength, plasticity and durability of structural metals are dominated by micro-structures consisting of grain boundaries (GBs), precipitate/metal interfaces, and dislocations with various distributions of solutes or additives. The understanding on the structure and properties of GBs, interfaces and dislocations with and without solute or additive atoms at atomic or electronic scales is essential. For this purpose, density-functional theory (DFT) calculations of supercells of GBs, interfaces and dislocations are quite effective. In conventional plane-wave DFT calculations, however, total energy and stress tensor are obtained as quantities integrated or averaged in the supercell, and we cannot clarify local distributions of energy and stress in the supercell. Thus we developed the computational schemes to obtain local energy and local stress inside a supercell [1,2] within the plane-wave PAW-GGA framework, as coded in QMAS package. In our schemes, energy and stress densities are integrated in each local region to remove the gauge-dependent terms, leading to the local energy and local stress of atoms, layers or clusters. Our schemes have been applied to various subjects of structural metals [3-5], and combined with machine-learning techniques [6], as multi-scale computations for structural materials. For example, local-stress changes were used to calculate local bulk moduli of Si-Fe clusters in Fe-Si alloys, so as to understand bulk-modulus changes of Fe-Si alloys depending on Si concentration [3]. The mechanism of GB segregation of a series of 3d-transition metal solutes at Fe GBs was clarified by the local-energy decomposition [4]. In first-principles tensile tests of metallic GBs, local-energy and local-stress analyses provided deep insights into the mechanism of deformation and failure of GBs [5]. Local energies at metallic GBs are effective as the data for machine-learning techniques to predict energies of various GBs in metals [6].Acknowledgement: We greatly thank Dr. Shoji Ishibashi, Prof. Yoshinori Shiihara, Dr. Tomoyuki Tamura, Dr. Hao Wang and Dr. Somesh Kr. Bhattacharya for fruitful discussion.[1]Y. Shiihara et al., Phys. Rev. B 81, 075441 (2010), [2]H. Wang et al., J. Phys. Condens. Matter 25, 305006 (2013), [3]S. Kr. Bhattacharya et al., Mat. Res. Express. 4, 116518 (2017), [4]Z. Xu et al., J. Phys. Condens. Matter 31 (2019) in print, [5]H. Wang et al., Modeling Simul. Mater. Sci. Eng. 25, 015005 (2017); J. Phys. Condens. Matter 31 (2019) in print, [6]T. Tamura et al., Modeling Simul. Mater. Sci. Eng. 25, 075003 (2017).
43 Large-scale ab-initio simulation for nano-optics based on time-dependent density functional theory

Mitsuharu Uemoto(Center for Computational Science, University of Tsukuba), Kazuhiro Yabana(Center for Computational Science, University of Tsukuba)

Abstract(Click to expand)

We have been developing a first-principles simulation packagefor electron dynamics, SALMON (Scalable Ab-initio Light-Mattersimulator for Optics and Nanoscience) [1-2]. It has beenimplemented in several computing environments such as K-computerand Oakforest-PACS [3], and has shown highly efficient and scalableperformance. We are now advancing preparations for the post-K system.One of unique features of the SALMON code is its capability todescribe light-wave propagation and electron dynamics simultaneously,which is called "Maxwell+TDDFT multiscale simulation" [3].In the method, electron dynamics is described by ab-initiotime-dependent density functional theory (TDDFT) [4] and electromagneticfields are described by Maxwell's equation.Recently, electromagnetism analysis methods have been extensivelyutilized for the analyses and design of optical devices. However,the conventional methods cannot account for optical nonlinearities suchas high harmonic generation and saturable absorptions, which areimportant in forefront optics utilizing intense laser fields. The TDDFTsuccessfully provides quantum-mechanical descriptions for opticalnonlinearities of solid media from first-principles.quantum mechanics [6].In our multiscale formalism, we use coordinate systems of two differentspatial scales. macroscopic grids for the propagating electromagnetic fieldsand the microscopic (atomic scale) grids for electron dynamics;individual TDDFT calculation is carried out at each macroscopic grid point.We have implemented the method so as to treat various dimensionality for themacroscopic electromagnetic fields: For one-dimensional case, propagationof laser pulses and nonlinear polarization in dielectric filmshas been studied. Recently, we have extended it to oblique propagation witharbitrary incident angles [7]. Two and three-dimensional calculations arechallenging testbed for current high-performance computing. we have appliedour method to a laser excitation of semiconducting nanostructures consistingof about 32,000 macroscopic grids at which TDDFT calculations are carried out.[1] SALMON-TDDFT Project: https://salmon-tddft.jp/[2] M. Noda et al., Comm. Comput. Phys. 235, 356-365 (2019).[3] Y. Hirokawa et al, In proceedings of ISC High Performance 2018,Lecture Notes in Computer Science, 10876, 202-205 (2018).[4] S.A. Sato et al, Phys. Rev. B 92, 1 (2015)[5] K. Yabana and G.F. Bertsch, Phys. Rev. B 54, 4484 (1996).[6] M. Uemoto and K. Yabana, J. Chem. Phys (on submission) arXiv:1810.06500[7] M. Uemoto and K. Yabana, Proceedings of the 29th Symposium ofAssociation for Condensed Matter Photophysics, pp.205-208 (2018).
44 Massively-parallel first-principles calculations for near field optics: wave vector excitations in silicon

Masashi Noda(University of Tsukuba), Kenji Iida(Institute for Molecular Science), Maiku Yamaguchi(The University of Tokyo), Kazuya Ishimura(Institute for Molecular Science), Takashi Yatsui(The University of Tokyo), Katsuyuki Nobusada(Institute for Molecular Science), Kazuhiro Yabana(University of Tsukuba)

Abstract(Click to expand)

In current frontiers of optical science, it is quite significant to understand electron dynamics in nano-materials induced by the electromagnetic fields in time domain. Electronic excitation by optical near fields (ONFs) is one of the phenomena. The ONF is a nonuniform, localized field generated around nano-particles or local nano-structures in bulk materials. Enormous enhancement of the electric fields has been intensively reported experimentally and theoretically. The ONFs themselves are expected to manifest interesting characteristics in nanoscale region. One of the phenomena is wave vector excitations. The physical mechanism underlying the idea is originated from the fact that the ONF has several orders of magnitudes larger components of finite wave vectors than those of the far field of the incident light. This is a natural consequence of the uncertainty principle for the ONF that is localized in a nanostructure. We previously showed that the wave vector excitation actually occurs in an analytical model [1]. In the present study, we perform first-principles calculations of the wave vector excitation in a realistic semiconductor system, silicon.To describe the process, we have been developing a program package SALMON (Scalable Ab-initio Light-Matter simulator for Optics and Nanoscience) that is based on first-principles time-dependent density functional theory [2,3]. SALMON solves time-dependent Kohn-Sham equation in real time using three-dimensional grid representation. The code is efficiently parallelized with respect to spatial grids, orbitals, and k-points. We have applied the code to wave vector excitations by the ONF in three silicon bilayers with a Si(111) surface terminated by hydrogen atoms. To describe electronic excitations accompanying wave vector changes that are promoted by the ONF, we calculated the system composed of 8,192 atoms. The calculation costs 13 hours using 4096 nodes on the K computer at RIKEN Center for Computational Science. We have found that the excitation by ONF is a few orders of magnitude larger than that by the far field. We also find the lowering of the absorption edge by the ONF excitation that is attributed to direct interband transitions with finite wave vector differences. These results indicate that the wave vector excitations by the ONF will realize higher detection efficiency of the silicon photo-detector.[1] M. Yamaguchi and K. Nobusada, Phys. Rev. B 93, 195111 (2016).[2] http://salmon-tddft.jp[3] M. Noda et al., Comput. Phys. Commun. 235, 356 (2019).

Basic science

45 Complex Langevin analysis of the spontaneous rotational symmetry breaking in the Euclidean type IIB matrix model

Konstantinos N. Anagnostopoulos(National Technical University of Athens), Takehiro Azuma(Setsunan University), Yuta Ito(KEK), Jun Nishimura(KEK, SOKENDAI), Toshiyuki Okubo(Meijo University), Stratos Kovalkov Papadoudis(National Technical University of Athens)

Abstract(Click to expand)

The type IIB matrix model has been proposed as a non-perturbative definition of superstring theory. In this work, we study the Euclidean version of this model in which extra dimensions can be dynamically compactified if a scenario of spontaneously breaking the SO(10) rotational symmetry is realized.The Euclidean IIB matrix model causes a very strong complex action problem due to the large fluctuations of the complex phase of the Pfaffian which appears after integrating out the fermions.In this work, we apply the Complex Langevin Method to the IIB matrix model in order to study the breakdown of the SO(10) rotational symmetry, and compare the result with that of the Gaussian Expansion Method.
46 CI+PT: A relativistic atomic code package combining confirmation interaction and Møller-Plesset perturbation theory for valence space

Charles Cheung(University of Delaware), Marianna Safronova(University of Delaware), Sergey Porsev(University of Delaware), Mikhail Kozlov(Petersburg Nuclear Physics Institute), Ilya Tupitsyn(St. Petersburg State University)

Abstract(Click to expand)

There is currently a need for the next-generation of relativistic atomic codes capable of high-precision calculations of the properties of atoms and ions in the middle columns of the periodic table. This stems from recent developments in experiments dealing with fields such as atomic clocks and studies of variations of fundamental constants. The goal of this work is to continue development of a broadly applicable atomic code based on a method combining configuration interaction (CI) and many-body perturbation theory (MBPT). In this extension of the CI+MBPT package, CI+PT improves and optimizes the CI valence space by using CI in a small subspace and calculating 2nd order Møller-Plesset (MP2) corrections for the complementary subspace. We start with a small CI space and calculate 2nd order MP corrections for all states of interest. At the same time, weights of configurations are calculated in the 1st order correction to wavefunctions. In the next iteration, all configurations with weights above some threshold are incorporated in a new CI subspace with the complementary subspace under MP2 corrections. This can be repeated for an optimal valence CI space. After the valence CI space is formed, MBPT corrections can be added to account for core-valence correlations in the standard manner for CI+MBPT.[1] M. G. Kozlov, S. G. Porsev, M. S. Safronova, and I. I. Tupitsyn, Computer Physics Communications 195, 199 (2015), ISSN 0010-4655
47 Vlasov-Poisson simulations of neutrinos in the large-scale structure formation

Satoshi Tanaka(University of Tsukuba), Kohji Yoshikawa(University of Tsukuba), Naoki Yoshida(Kavli IPMU, University of Tokyo)

Abstract(Click to expand)

We numerically investigate the dynamics of cosmic massive neutrinos in the large-scale structure formation in the universe. Here, we adopt a rather new numerical approach, Vlasov simulations in which we directly solve the collisionless Boltzmann equation (Vlasov equation) in 6-dimensional phase space, instead of conventional N-body simulations. It enables us to perform numerical simulation without any shot-noise, two-body relaxation and is suitable to investigate the dynamics of self-gravitating systems with high velocity dispersion such as a free-streaming of massive neutrinos. In this work, we focus on the dynamical effect of cosmic massive neutrinos such as the dumping of mass density power spectrums and the effect of behavior of the velocity dispersion of neutrinos on the large-scale structure formation in the universe. In addition, we discuss the possibility of estimating neutrino masses using other observation quantities because the direct observation of cosmic neutrinos is impossible with current observation accuracy.
48 Exploring particle accelerations at astrophysical shock waves by using supercomputer K

Yosuke Matsumoto(Department of Physics, Chiba University), Takanobu Amano(The University of Tokyo), Tsunehiko Kato(National Astronomical Observatory of Japan), Masahiro Hoshino(The University of Tokyo)

Abstract(Click to expand)

Astrophysical shock waves have been a candidate for the origin of cosmic-rays. In particular, X-ray emissions from supernova remnant shocks have provided great opportunities to examine how high-energy electrons are produced at collision-less shocks. Numerical simulations have revealed that electrons can be efficiently heated and accelerated via resonant interactions with plasma kinetic waves, such as the electron shock surfing acceleration mechanism in which electron-scale Buneman instability played key roles. Recently, Matsumoto et al. [2015] proposed a new acceleration mechanism by turbulent reconnection in the shock transition region through excitation of the ion-beam Weibel instability. In order to deal with the two different acceleration mechanisms in a self-consistent system, we examined 3D PIC simulations of quasi-perpendicular, high-Mach-number shocks. With the help of the high computational capability of the K computer, we successfully followed a long time evolution in which the two different acceleration mechanisms coexist in the 3D shock structure. The Buneman instability was strongly excited ahead of the shock front in the same manner as have been found in 2D simulations. In the transition region, the ion-beam Weibel instability generated strong magnetic field turbulence in 3D space. The turbulence was much stronger than those found in 2D simulations. Plasma blobs found in the turbulent region indicated magnetic reconnection took place in 3D magnetic field structures. As a result, electron energy spectrum in the downstream region exhibited a high energy tail following a power-law distribution. In this talk, we present how such relativistic electrons are produced during traveling in the 3D shock structure.

Programming Environments

49 Machine Learning Performance Analysis for Phylanx: An Asynchronous Array Processing Toolkit

Weile Wei(Louisiana State University), Hartmut Kaiser(Louisiana State University)

Abstract(Click to expand)

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High-Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high-level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards.
50 Dynamic Multitasking in Upcoming XcalableMP 2.0

Hitoshi Murai(RIKEN R-CCS), Mitsuhisa Sato(RIKEN R-CCS), Masahiro Nakao(RIKEN R-CCS), Jinpil Lee(RIKEN R-CCS)

Abstract(Click to expand)

To achieve both higher performance and productivity on future manycoresupercomputers such as Post-K, a new programming model is stronglydemanded, and a task-based one is considered most promising becauseits asynchronous characteristic is useful to resolve some problems ofexisting models.On the other hand, XcalableMP (XMP), a directive-based parallellanguage based on Partitioned Global Address Space (PGAS), is proposedby the XMP working group of the PC Cluster Consortium, and we havebeen developing the Omni XcalableMP compiler, which will be availableon Post-K.The upcoming version 2.0 of XMP (XMP2.0) will support "tasklet" thatis a feature of dynamic multitasking for the task-parallel executionmodel, in addition to the existing data-parallel one. One of theadvantages of tasklets is that inter-node dependencies andinteractions between tasklets as well as intra-node ones can bespecified with simple directives to write task-parallel programs fordistributed-memory systems. We have started considering how toimplement effectively the tasklet feature in Omni XMP.In this presentation, the specification of the tasklet feature inXMP2.0 and the implementation of it we are considering for Omni XMPwill be described.
51 Scalability of MCMC algorithms on different parallel frameworks

Jeremiah Mbazor(Ulsan National Institute of Science and technology (UNIST)), Marco Torbol(Ulsan National Institute of Science and technology (UNIST))

Abstract(Click to expand)

Markov Chain Monte Carlo is a method of analysing systems with large variability such as particle swarms and ants random motion, predictive text and more recently in machine learning. The main advantage of Markov chain is the ability of tracking the evolution of a system reliability in time domain. The main drawback is the size of the transition matrix involved. Markov chain Monte Carlo is used to find the probability of failure of a system by running large number of simulations through the transition matrix until the final state is reach. Single instruction single data (SISD) processors, like a CPU core, execute one simulation at time through the transition matrix for every step. To achieve good performance clusters of CPUs must be used. In this study, we propose the used of single instruction multiple data (SIMD) processors, like a GPU. We also compare this with other parallel programming frameworks like MPI and threaded CPU operations. Thus, leading to a scalability map showing the efficiency of the different algorithm frameworks. This forms the foundational benchmark for modern applications especially deep learning
52 Asynchronous Graph Processing Using Message Driven Systems

Bibrak Qamar Chandio(Indiana University Bloomington), Prateek Srivastava(Indiana University Bloomington)

Abstract(Click to expand)

High Performance Computing hardware, which offers large amount of parallelism, is grounded in architectural and technology assumptions that practically impose limits on granularity of the parallelism. One such factor is Bulk Synchronous Parallel (BSP) model. To elevate this one promising exploration space is fine grain event driven execution models, such as ParalleX. This work explores Graphs---structures that inherently have large amount of parallelism. Asynchronously processing graphs using event driven execution models exposes this inherent fine grain parallelism. We implement asynchronous graph processing algorithms, especially the Single Source Shortest Path (SSSP) under HPX runtime system (which is an implementation of the ParalleX model) and concurrent priority queues, notably SprayList, Lotan and Shavit, and Linden and Jonsson Priority Queue. Performance results reveal insights into aspects of execution, specially the relationship between dynamic growth of work and scheduling policy and the overhead that is introduced by scheduling policies. From this experience we abstract out primitives that are need for asynchronous graph processing. These primitives will later help in defining an Instruction Set Architecture (ISA) for a graph based memory accelerator. The ISA will contribute towards Continuum Computer Architecture (CCA)--- a family of non-von Neumann architectures that combine parallel control flow semantics of the ParalleX execution model with homogenous highly replicated lightweight compute cells.

System Software

53 Prototype Implementation of MPICH and Data Transfer Framework for Post-K Supercomputer

Masamichi Takagi(RIKEN), Tatiana Martsinkevich(RIKEN), Masayuki Hatanaka(RIKEN), Yoshiyuki Morie(RIKEN), Atsushi Hori(RIKEN), Takumi Honda(RIKEN), Balazs Gerofi(RIKEN), Guo-Yuan Lien(RIKEN), Seiya Nishizawa(RIKEN), Hirofumi Tomita(RIKEN), Takemasa Miyoshi(RIKEN), Yutaka Ishikawa(RIKEN)

Abstract(Click to expand)

The successor of K computer (Post-K), which will be deployed in around 2021, has the advanced interconnect called Tofu Interconnect D (TofuD) for massive scalability. It has ten external ports and six DMA engines, which transfer data between the memory devices and the local ports of the internal router. It also has offload engines, which process orchestrated communication with many MPI processes without CPU processing.In this poster, we present the preliminary implementations of the two libraries to make application scale with tens of thousands of nodes.The first one is MPICH for Post-K, called RIKEN MPI. It is customized to Post-K and it complements the vendor-provided MPI by providing emerging optimizations and emerging MPI-standard features by frequent updates. It provides the offloaded, persistent collective protocols and the memory-saving point-to-point protocol for this purpose.The offloaded protocols are customized to the unique network / NUMA topology of Post-K. The "neighbor" type of the protocol, which performs a collection of communications with subsets of nodes, adds another optimization where the DMA engines are scheduled in a way that none of the engines are under- nor over-utilized. The memory-saving protocol uses the receive buffer in an efficient manner by sharing one receive buffer among multiple remote MPI ranks.We evaluate the offloaded protocol on a PRIMEHPC FX100 cluster, which has the predecessor of TofuD, and confirm the benefit over the non-offloaded counterpart.As for memory-saving protocol, we show that the protocol consumes a fixed amount of memory and thus it is scalable with over one million MPI processes.The second one is the Data Transfer Framework (DTF) which eliminates the file I/O bottleneck of the application written with the new, workflow programming style. In the style, an application comprises of multiple components and each of them performs one task in a workflow. The task usually performs calculations on data and passes the result to the next component(s) with file I/O. This file I/O becomes a bottleneck as the amount of data grows.DTF is designed to eliminate the bottleneck. It intercepts I/O calls written in PnetCDF interface and, instead of performing file I/O, directly sends the data to the target components with MPI.We have tested the performance of DTF in a real-world two-component application called SCALE-LETKF and showed that it performed well on the scale up to thousands of processes.
54 Operating System and Runtime Enhancements for the Post-K Computer

Balazs Gerofi(RIKEN), Atsushi Hori(RIKEN), Masamichi Takagi(RIKEN), Dominique Martinet(RIKEN), Yutaka Ishikawa(RIKEN)

Abstract(Click to expand)

RIKEN Center for Computation Science is leading the development of Japan's next generation flagship supercomputer, the successor of the K Computer. Part of this effort is to design and develop a system software stack that suits the needs of future extreme-scale computing. In this poster, we introduce IHK/McKernel and PiP. IHK/McKernel a lightweight multi-kernel operating system that runs Linux and a light-weight kernel side-by-side on compute nodes with the primary motivation of providing scalable and consistent performance for large-scale HPC simulations. At the same time, it retains a fully Linux compatible execution environment. We provide an overview of the software architecture and show performance results on the full-scale Oakforest-PACS machine, a many-core based supercomputer consisting of 8,192 Intel Xeon Phi Knights Landing nodes.There are two widely used parallel execution models; multi-process and multi-thread. Parallel programs consist of several (or many) execution entities, processes or threads, and most importantly, they communicate and/or interact with each other. In the multi-process model, each process has its own virtual address space and it is not allowed to access the data owned by the other process. In the multi-thread model, threads share the same virtual address space, but all statically allocated variables are shared among the thread. In the multi-process model, nothing is shared. This makes inter-process communication hard. In the multi-thread model, everything is shared. The shared variables and data must be protected to avoid race conditions. The pros of a model can be the cons of the other model. There is no race condition happens on variables and data since each process has its own variable and data in the multi-process model. Threads can easily interact and communicate with the thread, since they share the same virtual address space. Process in Process (PiP) is a new concurrent execution model that takes the best of multi-process and multi-thread execution models. In this model, variables are privatized so that each task has its own variable set and tasks share the same virtual address space. The execution entity in this model is called a task since it does not follow the definitions of either processes or threads. Tasks can interact and communicate with others easily and race conditions can only happen on the explicitly shared variables and/or data. Since tasks run in the same virtual address space, any data owned by the tasks can be accessed if the addresses of the variables and/or data are known. In this way, the cons of the conventional models turn into the pros. While this idea is not new, existing implementations require either a dedicated OS kernel or specialize language processing system consisting of compiler, linker, runtime system, and debugger. Process-in-Process (PiP) is the first implementation as a user-level library, which needs neither dedicated OS kernels nor language systems. Thus, PiP is portable and easy to deploy.
55 Enhancing MPI-IO with Topology-Awareness at the K computer

Yuichi Tsujita(RIKEN), Atsushi Hori(RIKEN), Atsuya Uno(RIKEN), Yutaka Ishikawa(RIKEN)

Abstract(Click to expand)

Recent supercomputers can provide high computing power anda large amount of storage spaces by parallel file systems.Although parallel file systems such as Lustre havehigh potential in file I/O,hierarchical configuration of compute nodes and parallel file systemshas brought us a difficulty in improving file I/O performance.One reason is mismatch between process layout on compute nodesand configuration of parallel file systems including inter-connectionsbetween compute nodes and parallel file systems.We focus on process layout optimizationin order to gain I/O performance using collective MPI-IOat a large scale of supercomputers.Collective MPI-IO has been playing a big role inapplication oriented I/O libraries such as HDF5 andPnetCDF as an underlying parallel I/O software stack.However, collective MPI-IO can not achieve enough performancedue to the above-mentioned mismatch.We have been proposing aggregator layout optimizationin an MPI-IO implementation, named ROMIO, at the K computer,where aggregators are processes which play file I/Oinside the MPI-IO implementation.Since behavior of I/O operation is dependent on layout ofassigned compute nodes and I/O nodes,we have realized topology-aware aggregator layout optimizationwhich is suitable for parallel file systems at the K computer.We present the optimization which has improved I/O performancein benchmark runs compared with the original implementation.
56 Development of Scientific Numerical Libraries on post-K computer

Toshiyuki Imamura(RIKEN), Yusuke Hirota(RIKEN), Daichi Mukunoki(RIKEN), Shuhei Kudo(RIKEN), Akiyoshi Kuroda(RIKEN), Naoki Sueyasu(Fujitsu)

Abstract(Click to expand)

RIKEN and Fujitsu are jointly developing Arm-based numerical libraries optimized for the brand-new microprocessor A64FX. The A64FX is newly designed for a national flagship system called `post-K.’ One of the key enhancements of the A64FX is wider-and-flexible-length vector format called Scalable Vector Extension (SVE). We present the current status of the task force work focused on netlib+SSL-II and the enhancement of OSS mainly developed by RIKEN, for example, EigenExa, Batched-BLAS, and so forth. Most of the numerical libraries are optimized with a new feature of Arm-SVE, high-level thread parallelization (single A64FX chip has 48 worker cores and 2 or 4 cores for OS activities), and the Tofu interconnect D, which is still a 6D mesh-torus topology. Since the DGEMM kernel, which is the heart of scientific numerical libraries, achieved approximately 2.5TFLOPS using an A0 stepping of the A64FX, it is expected to boost the performance of other significant BLAS kernels and other scientific numerical packages. Also, we demonstrate the results of codesign among nine prioritized application fields, such as an improvement of the performance bottlenecks found in some applications, implementation of new kernels, and enhancement of functionalities of the existing scientific numerical packages. Typical libraries to be exhibited are half-precision (FP16) BLAS kernels; the flexibly schedulable batched framework and full implementation of the batched BLAS routines, a high throughput eigenvalue solver especially for a single node execution, a highly scalable divide-and-conquer framework for distributed parallel eigenvalue solver, the communication avoiding 2.5D-based PDGEMM routine.

HPC Operations Management

57 Development of a visual analytics system which can search for the cause of failure interactively

Kazuki Koiso(Kobe University), Sakamoto Naohisa(Kobe University), Jorji Nonaka(R-CCS)

Abstract(Click to expand)

Large-scale scientific computing facilities usually operate expensive HPC (High Performance Computing) systems, which have their computational and storage resources shared with the authorized users. On such shared resource systems, a continuous and stable operation is fundamental for providing the necessary hardware resources for the different user needs, including large-scale numerical simulations, which are the main targets of such large-scale facilities. For instance, the K computer installed at the R-CCS (RIKEN Center for Computational Science), in Kobe, Japan, enables the users to continuously run large jobs with tens of thousands of nodes (a maximum of 36,864 computational nodes) for up to 24 hours, and a huge job by using the entire K computer system (82,944 computational nodes) for up to 8 hours. Critical hardware failures can directly impact the affected job, and may also indirectly impact the scheduled subsequent jobs. To monitor the health condition of the K computer and its supporting facility, a large number of sensors has been providing a vast amount of measured data. Since it is almost impossible to analyze the entire data in real-time, these information has been stored as log data files for post-hoc analysis. In this work, we propose a visual analytics system which uses these big log data files to identify the possible causes of the critical hardware failures. We focused on the transfer entropy technique for quantifying the “causality” between the possible cause and the critical hardware failure.The system we propose consists of 4 subsystems. The subsystems are Graph Plot view subsystem, Spatiotemporal view subsystem, Heat Map Plot view subsystem and Causality Plot view subsystem. In the Heat Map Plot view subsystem, we utilized spectralbiclustering method for limit the racks used for calculations. As a case study, we focused on the critical CPU failures, which required subsequent substitution, and utilized the log files corresponding to the measured temperatures of the cooling system such as air and water.
58 An Approach of Energy Aware in HPC Environment

Laercio Pioli Junior(Federal University of Juiz de Fora), Mario Dantas(Federal University of Juiz de Fora)

Abstract(Click to expand)

With the exponential growth of computational power, data-intensive computing and petascale scientific applications are consuming a large amounts of processing and resources. In light of this, power management becomes a big issue for this kind of environment that needs to manage the correct spend of energy consumption. These applications are normally a kind of parallel computing applications, which use techniques, to improve the performance, data processing, post-hoc analysis and visualization. Usually, these applications need to accesses the storage device to save or manage information, which implies a high IO request rate, increasing then the energy consumption. High energy consumption is one of the most important issues in HPC environments. Multiple directions are being proposed by researchers to minimize these power challenge in HPC environments. Some of these directions are: 1) processing power needed by applications, 2) correct management of data and hardware to enhance power usage, 3) scheduling of tasks, 4) fault-tolerance mecanism throught checkpointing-restart in HPC environments among others. As we can see, these studies are being developed in HPC environments, but it's still necessary these energy-aware consumption be extended and managed in HPV scenario. This study focuses on the energy aware in High Performance Computing as well as High Performance Visualization environments proposing an approach that colaborates with green computing paradigms.
59 User Support Activities by RIST in the HPCI System including K computer in Japan

Yoshinori Kusama(RIST (Research Organization for Information Science and Technology)), Takaaki Noguchi(RIST (Research Organization for Information Science and Technology)), Setsuko Kondo(RIST (Research Organization for Information Science and Technology)), Motoi Okuda(RIST (Research Organization for Information Science and Technology))

Abstract(Click to expand)

High Performance Computing Infrastructure (HPCI) of Japan including "K computer" [1] consisting of a wide variety of supercomputer resources provides high performance computing services for worldwide researchers and engineers of both academia and industry. Research Organization for Information Science and Technology (RIST), which is Registered Institution for Facilities Use Promotion of the “Specific High-speed Computer Facilities” (K computer) and the Representative for HPCI Operation, is responsible for user selection and resource allocation, user support and dissemination of achievements in HPCI. Among these responsibilities, this paper presents user support activities by RIST.RIST provides the following user support activities: provision of variety of information; response to inquiries and requests from HPCI users; technical supports; implementation of seminars and workshops; and other necessary support services. Key activities are introduced in this abstract.Helpdesk: RIST provides a variety of user support services through Helpdesk as the general and single contact point for HPCI users and possible users in future. It provides “first-level supports” such as assisting HPCI application, providing hardware and software information, helping building and running jobs, advising libraries and tools, and solving first-level technical problems. Helpdesk responds to about 1,600 inquiries every year.Advanced-Level Technical Supports: Experts of a variety of scientific areas and/or of computer systems and applications having advanced techniques and knowhow provide high-level technical supports as “Advanced-Level Technical Supports”. Main supports are porting and optimization of application software to the target platform; analysis of the job performance (communication properties, evaluation of imbalance and single-node performance), improvement of job performance (serial and parallel optimization), visualization of simulation outputs. These supports are the key to make maximum use of the HPCI computational resources. RIST provides about 30 advanced-level supports every year. The expert support reports describing detailed support contents are shared on the HPCI portal site [1].Application Software Provisioning: As part of the user support service, RIST has installed application software programs widely used or expected to be widely used in HPCI projects: open source software (OSS) programs; national projects application programs in order to make them ready-to-use with a view to improving convenience of users, creating early outcomes and extending HPCI users and so on. Four OSS programs were installed in the K computer and six national projects application programs were installed in three HPCI systems except for the K computer in 2017, and the installation status and usage information of each software are introduced on the HPCI portal site [2]. The activity is being conducted also in 2018.[1] http://www.hpci-office.jp/folders/english.[2] http://www.hpci-office.jp/pages/e_appli_software.
60 An Improvement of availability by inter-site data redundancy in HPCI shared storage

Hiroshi Harada(RIKEN), Osamu Tatebe(University of Tsukuba Center for Computational Sciences), Hidetomo Kaneyama(RIKEN), Seiichirou Naka(The University of Tokyo, Information Technology Center), Toshihiro Hanawa(The University of Tokyo, Information Technology Center)

Abstract(Click to expand)

HPCI shared storage is a data sharing infrastructure for the HPCI (http://www.hpci- office.jp/folders/english), and operated jointly by R-CCS and the University of Tokyo. HPCI shared storage adopts the Gfarm distributed file system (http://oss- tsukuba.org/en/software/gfarm), and it is possible to use it from Japanese major supercomputers including K-computer. HPCI shared storage system was replaced in FY 2017 and currently provides a 45 PB file systems space.In the HPCI shared storage, redundancy operation, in which one or more data replications are respectively allocated to R-CCS and the University of Tokyo started in February 2018.Due to this redundant operation between R-CCS and the University of Tokyo, even if a serious failure occurred at one site, service could be continued by data replication at the other site. Actually, 24th of August 2018, the R-CCS system temporarily stopped due to the influence of the typhoon passing through Kobe city, but the HPCI shared storage service was continued by the University of Tokyo. At this time, the master meta data server, which normally running at R- CCS, was failed over from R-CCS to the University of Tokyo. In the following, an instantaneous blackout occurred due to a lightning strike at the University of Tokyo in 31th of Aug 2018, and the all of equipment at the University of Tokyo suddenly stopped operating, but the HPCI shared storage continued operation with R-CCS equipment. In December 2018, it became necessary to stop the University of Tokyo equipment for about a week to increase the disk system of the University of Tokyo, but beforehand using redundant data by R-CCS equipment, we succeeded in continuing the service both writing and reading access. In the 3Q of 2019, due to R-CCS facility construction, RIKEN equipment will be suspended for a couple of weeks, but HPCI shared storage services are expected to continue as described above.During March 2018 until the end of December 2018, the system was stopped only during planned maintenance in April and October 2018, during which the operating ratio recorded 99.12%. We will present on the details of improvement of availability by our inter-site data redundancy and our operation.
61 Riken Post-K processor simulator

Yuetsu Kodama(RIKEN), Tetsuya Odajima(RIKEN), Akira Aasato(Fujitsu Limited), Mitsuhisa Sato(RIKEN)

Abstract(Click to expand)

We have been developing a post-K processor simulator for the aim of developing the program in Post-K at an early stage. This simulator is based on the general-purpose processor simulator named gem5, so it does not simulate the post-K processor hardware itself. However, we think that sufficiently accurate simulation is available by simulating out-of-order pipeline execution in cycle-accurate with detailed parameter tuning and function expansion for post-K processor. In this simulator, we aim to estimate the execution speed of application on one node of Post-K processor with accuracy that makes relative evaluation that enables application tuning possible. In this poster, we show the implementation of this simulator and verify its accuracy compared with the number of execution cycles of the test chip of Post-K.
62 Operations Management Software for the Post-K computer

Atsuya Uno(RIKEN), Toshiaki Mikamo(Fujitsu Ltd.)

Abstract(Click to expand)

The Post-K computer will be the exa-scale super computer system andwill achieve application execution performance up to 100 times that ofthe K computer.The main functions of the operations management software are thesystem management, the job scheduling and the user management.Riken and Fujitsu enhanced the Fujitsu’s software called“Parallelnavi” that manages thousands-node supercomputer systems tomanage the K computer on a scale of more than 80,000 nodes and enabledto achieve a stable operation and a high-performance computingenvironment.In addition, during the operation of the K computer for more than 8years, we have learned many things from it and have found some pointsthat we have to enhance to improve the operation of the K computer.Based on these experiences, we have developed the operationsmanagement software for the Post-K computer.In this poster, we introduce the outline of the operations managementsoftware for the Post-K computer.

Artificial Intelligence

63 A Base DNN model selection method for efficient transfer learning toward smart society

Yosuke Ueno(The University of Tokyo), Masaaki Kondo(The University of Tokyo)

Abstract(Click to expand)

Towards realizing future "smart society", a collaboration between edge and cloud (supercomputer) systems known as computing continuum is important. As one of the examples, we are studying a cloud-edge cooperative fine-tuning technique for efficient deep learning in constructing a specialized inference engine for an individual environment. Recently, image recognition achieves significant improvement by using deep Convolutional Neural Networks (CNN) but training a deep CNN is a time-consuming task because it requires a large number of labeled datasets and huge compute resources. Fine-tuning, one of the transfer learning methods, is promising since it tunes existing CNNs that have been trained using a large labeled dataset for different purposes. One of the challenges of fine-tuning is to choose a base model which is suited for a given target for efficient fine-tuning. To tackle these challenges, we propose the model selection criteria using the internal state of each model when target task images are input to it. The basic system concept of our proposed technique is that the cloud selects a base model for fine-tuning using a few sample images transferred from the edge and the edge having whole images for the target task performs training. Our experiments were conducted with AlexNet and Resnet18 models that were trained with various subsets of ImageNet. We calculated the proposed criteria with several pre-trained models for certain target tasks. We also performed fine-tuning with all the combinations of pre-trained models and target tasks to evaluate the correlation between the criteria and the accuracy of fine-tuned models. Some criteria showed a strong correlation and a strong rank correlation with the fine-tuned accuracy. Overall, the results show that some of our proposed criteria are effective to select the base model used for fine-tuning. We also found that original convolutional layers in pre-trained models specially using deeper layers do not bring good accuracy for some cases of fine-tuning. We also investigated a fine-tuning methodology that uses only a part of convolutional layers of base models by pruning some of deeper layers to get a more accurate model efficiently. With the proposed criteria, we can select a part of layers that acquires the suitable features for target tasks. In the case of Alexnet and Resnet18, about 34% and about 20\% of the computation amount for convolution layers were reduced with improving the recognition accuracy of the target tasks about 1% and 4%, respectively.
64 Performance Tuning of Deep Learning Framework Chainer on the K computer

Akiyoshi Kuroda(RIKEN), Kiyoshi Kumahata(RIKEN), Kazuo Minami(RIKEN)

Abstract(Click to expand)

Recently machine learning by deep learning has become popular.Applications and research using GPU are advancing in computational science fields.However, it is possible to make many calculations by taking advantage of the characteristics of CPUs, even with massively parallel computers.Here, we introduce a performance tuning procedure for Chainer, which is a representative framework for utilization of machine learning on the K computer.Chainer expresses the hierarchical structure of deep learning using Python, and all calculations can be realized using numPy without special libraries.Python installed on the K computer has not been much optimized due to problems with the stability of the calculationsBy optimizing floating point underflow exception when building Python, elapsed time was improved to 1/3.37.Moreover, simply by replacing the SSL2 gemm library called by Python with the thread-parallel version, section elapsed time was improved to 1/4.62, the total elapsed time was improved to 1/1.14, and the performance efficiency was improved about 46.3\%.Many of the costs were in square root and arithmetic calculations performed when filters were updated.These operations are not optimized when calculated using numPy and are particularly slow on the K computer.By replacing the kernel with software pipelining and SIMD optimization by Fortran library, the kernel elapsed time was improved to 1/11.24 and total elapsed time was improved to 1/13.07 [Fig.1].There are some limitations on the use of Chainer on the K computer.It is necessary to prepare the learning data beforehand and to stage-in the data to an appropriate storage system.Moreover, since Python is in the shared storage, it takes time to load the library.However, a CPU parallel version of Chainer was developed as ChainerMN in 2018.It has high scalability and allows large-scale machine learning on the K computer by installing it

Consortium for Next Generation Combustion System CAE

65 Fully compressible combustion simulation of RCM in hierarchical Cartesian mesh system by Immersed boundary method

Wei-Hsiang Wang(R-CCS, Japan), Chung-Gang Li(Kobe University, Japan), Makoto Tsubokura(R-CCS & Kobe University, Japan)

Abstract(Click to expand)

The combustion of Rapid Compression Machine (RCM) is investigated numerically. All speed compressible flow solver by Roe scheme and 5th order MUSCL coupled with species transport equations is adopted for the flow and temperature field and reacting species fractions. The chemical reaction of combustion is conducted by equilibrium solver of Cantera module, which is used for evaluating the equilibrium state of the reacting flow and merged with the flow solver and G-equation flame front treatment. For the treatment of RCM geometry and moving piston, the Immersed boundary method is introduced in Hierarchical Cartesian mesh system. The validation is carried out by the comparison with experimental work. The simulation showed good agreement with experimental data by the quantities results of chamber pressure and the flow visualization of flame patterns and propagation speed.
66 LES of turbulent combustion with water spray

Takafumi Honzawa(Tokyo Gas Co., Ltd. and Kyoto University), Reo Kai(Kyoto University), Makoto Seino(Numerical Flow Designing Co., Ltd.), Takayuki Nishiie(Numerical Flow Designing Co., Ltd.), Kotaro Hori(Numerical Flow Designing Co., Ltd.), Ryoichi Kurose(Kyoto University)

Abstract(Click to expand)

Submerged Combustion Vaporizers (SCVs) are often used as LNG vaporizers. Those burners are designed to reduce the maximum flame temperature, which causes the reduction of NOx emission. In order to reduce the maximum flame temperature, it is useful to equip the water spray system for the burners. In this study, large-eddy simulations of the turbulent combustion fields generated by the burner are conducted, and the effect of the water spray injection on the combustion behavior is investigated. As the combustion model, a non-adiabatic flamelet approach, which can take into account the effects of heat losses due to water evaporation and cooling wall, is employed. The results show that the water spray injection effectively causes the reductions of maximum gas temperature and then the NOx emissions. As our ongoing work, the development of flamelet database by machine learning is also introduced.
67 LES modeling and simulation of coal gasification on an O2-CO2 blown coal gasifier

Hiroaki Watanabe(Kyushu University), Ryoichi Kurose(Kyoto University), Kenji Tanno(Central Research Institute of Electric Power Industry)

Abstract(Click to expand)

A numerical simulation of coal gasification on an O2-CO2 blown coal gasifier of the oxy-fuel Integrated coal Gasification Combined Cycle (IGCC) system was performed to investigate the phenomena taking place within the coal gasifier by means of a large-eddy simulation. The coal gasifier is operated in a low oxygen ratio condition to produce combustible gases. The coal gasifier employed in this semi-closed IGCC system will be operated in the O2-CO2 blown condition and should be appropriately designed in such a condition that we have no experience. The particle laden two-phase reacting flow within the gasifier was modeled with the Eulerian-Lagrangian manner employing the PSI-CELL method. The subgrid scale turbulence was considered by the dynamic Smagorinsky model. Three chemical processes, devolatilization, char gasification and gas phase reactions were considered. For the gas phase reactions, Scale Similarity Filtered Reaction Rate Model (SSFRRM) was used to consider the effect of the subgrid scale turbulence. The finite volume, unstructured, variable density incompressible LES solver, FFR-Comb, was used. In this study, two cases were performed in terms of the amount of CO2 in a gasifying agent. Results showed that the predicted gas temperature distribution and product gas composition showed good agreement with those obtained by the experiment. It was found that gas temperature in the combustor was significantly affected by issuing CO2 in the gasifying agent and became lower as the amount of CO2 in the gasifying agent increased. This was considered due to that the heat capacity of the gas mixture increased with increasing the amount of CO2. The strong swirling flow was formed in the combustor by the jet flows from the burners mounted tangentially and the almost particles were traveling near the inner wall due to the centrifugal force. It was revealed that the flow patterns for two cases did not show a large difference. The effect of issuing CO2 on the gasification performance was also discussed. The calorific value and gasification efficiency were improved with increasing CO2 in the gasifying agent. This suggested that CO2 recycled from the exhaust gas of the gas turbine played a positive role and enhanced the gasification reaction within the coal gasifier. The present LES was well demonstrated as a powerful tool to design the O2-CO2 gasifier in the oxy-fuel IGCC system.
68 Large-eddy simulation of a supercritical CO2 combustion field in a realistic combustor

Parikshit Jain(Toshiba Energy Systems & Solutions Corporation), Yasunori Iwai(Toshiba Energy Systems & Solutions Corporation), Yoshihisa Kobayashi(Toshiba Energy Systems & Solutions Corporation), Masao Itoh(Toshiba Energy Systems & Solutions Corporation), Takayuki Nishiie(Numerical Flow Designing Corporation), Ryoichi Kurose(Kyoto University)

Abstract(Click to expand)

As global demand for energy increases while environmental regulations tighten, novel power generation cycles are being developed to meet market needs. In order to meet this demand, 8Rivers Capital Limited Liability Company has been engaged in the development of an environmentally conscious thermal power generation system based on a Supercritical CO2 gas turbine cycle (Allam cycle). Toshiba ESS has been developing a turbine and a combustor for the Allam cycle. The cycle requires oxy-fuel-CO2 combustion at approximately 30MPa and 1150° C turbine inlet temperature. Designing a durable hardware capable of thousands of hours of operation with efficient combustion as well as production of clean CO2 exhaust is a challenging task which also requires an approach using numerical simulations. However, there are few reports of combustion simulation in such ultra-high pressure. In this study, large-eddy simulation (LES) is applied to a Toshiba’s supercritical CO2 combustor. In order to take the supercritical conditions into account, the Soave-Redlich-Kwong equation of state is applied, and the Chung model is used for transport properties. A dynamically thickened flame model is employed as a turbulent combustion model, and the (5 species, 2 reactions model) is used for the reaction mechanism. Due to extreme operation conditions inside the combustor, it is crucial to keep the combustion liner at adequate temperature. To predict the wall heat load on the combustion liner, a coupled fluid-structure conjugate heat transfer method is applied. The computation is carried out using an unstructured LES solver FrontFlow/Red modified by Kyoto University, CRIEPI, and NuFD (FFR-Comb). The computational mesh for the combustor consists of 129 million vertexes and 204 million cells. The computation took approximately one week using 10,000 cores of the “K computer” at RIKEN Advanced Institute for Computational Science.It was observed that a stable flame is formed and the liner wall temperature is strongly affected both by cooling CO2 stream which flows over the outer and inner surface of liner wall and the flame temperature. The metal temperature results have also been examined form the simulation results.
69 Heat Transfer between Wall and Impinging Spray Flames under Compression-Ignition Engine like Conditions: A DNS Study

Abhishek L. Pillai(Department of Mechanical Engineering and Science, Kyoto University, Japan), Takuya Murata(Department of Mechanical Engineering and Science, Kyoto University, Japan), Takato Ikedo(Toyota Central R&D Labs., Inc., Japan), Ryo Masuda(Toyota Central R&D Labs., Inc., Japan), Kazuhisa Inagaki(Toyota Central R&D Labs., Inc., Japan), Ryoichi Kurose(Department of Mechanical Engineering and Science, Kyoto University, Japan)

Abstract(Click to expand)

Automobiles account for approximately 16% of the total greenhouse-gas emissions worldwide. To mitigate the challenges of climate change, stricter standards on the thermal efficiency of automobile engines are being imposed by countries globally. Therefore, considerable research effort is being dedicated towards the development of Compression-Ignition (CI) engines with higher thermal efficiencies. Reducing heat loss through the combustion chamber wall tends to improve the thermal efficiency of engines, and hence, an accurate estimation of wall heat loss is paramount during the initial design stages. In this work, Direct Numerical Simulations (DNS) of spray flames (two-phase reacting flows) impinging on the wall of a constant volume combustion chamber, under CI engine like conditions are performed. The DNSs employ an Eulerian-Lagrangian framework [1], wherein the evaporating fuel droplets are tracked as Lagrangian mass points, while the gas-phase is treated as an Eulerian continuum. To capture the transient heat conduction within the wall, conjugate heat transfer simulation is also coupled to the DNS of spray combustion. The fuel considered for the liquid spray in these simulations is n-decane. A two-step simplified reaction mechanism tailored specifically to predict the ignition delay time and burnt gas temperature, for a wide range of equivalence ratio and fresh gas temperature is used to model the combustion of n-decane. Simulations are performed for various fuel spray injection velocities, and the DNS solutions are used to analyze the influence of fuel injection velocity on wall heat loss, during spray flame-wall interaction process. The Nusselt (Nu) and Reynolds (Re) numbers corresponding to convective heat transfer are correlated as Nu ∝ Re^n. Previous experimental investigations have shown that the value of this exponent n lies in the range 0.4-0.5 [2,3], which is less than the value of n in the conventional Woschni equation [4]. DNS results of the present work deduce that n = 0.58, which is consistent with experimental findings [2,3]. References[1] A. L. Pillai, R. Kurose, Combust. Flame 200 (2019) 168-191.[2] K. Inagaki, J. Mizuta, Y. Nomura, T. Ikedo, R. Ueda, Trans. Soc. Automotive Eng. Japan 47 (6) (2016) 1297-1303.[3] T. Oguri, Bulletin of JSME 3 (11) (1960) 363-369.[4] G. Woschni, SAE 670931 (1967).
70 Large-eddy simulation of combustion instability of spray combustion: Effect of time fluctuation of liquid fuel mass flow rate

Jun Nagao(Kyoto University), Abhishek Pillai(Kyoto University), Ryo Awane(Kyoto University), Ryoichi Kurose(Kyoto University)

Abstract(Click to expand)

To reduce the emissions of NOx in gas turbine engine, Lean Premixed Prevaporized (LPP) combustion is one of the effective solutions. However, lean turbulent combustion is inherently unstable and makes the combustor susceptible to combustion instability, which is characterized by severe pressure and heat release oscillations. Combustion instability causes high combustion noise levels and damage to the combustor. The detailed mechanism of combustion instability is still not completely understood and the issue remains yet unresolved because of the complexity of this phenomenon. To elucidate the underlying physics of combustion instability, researches have been conducted not only with experiments, but also with computational simulations [e.g., 1,2]. Recently, Kitano et al. [2] showed in terms of Large-eddy Simulation (LES) that the droplet diameter distribution of fuel spray, that is injected into the combustor, has a significant effect on combustion instability. However, they did not consider the effect of transient fluctuations of mass flow rate of liquid fuel. This factor is also crucial and could be a solution for attenuating combustion instability by controlling it. In this study, therefore, the time fluctuation of mass flow rate of liquid fuel caused by pressure fluctuations in the combustion chamber is taken into account, and the effect on the spray combustion behavior in a back-step flow is investigated using LES. This simplified model for time fluctuation of mass flow rate of liquid fuel is derived from Bernoulli’s equation. Four cases are examined, each having a different phase delay: the phase gap between fluctuations of pressure and mass flow rate of liquid fuel, and the effect of this factor on combustion instability is investigated thoroughly. The results show that the intensity of pressure oscillations has maximum value for the case with 180 degree phase delay. The phase delay influences the time variations of liquid droplet diameter, distributions of heat release rate, evaporation rate and so on, which consequently alter the behavior of combustion instability in each case.
71 Efficient diesel engine simulation using chemical kinetics and parallel computing

Tsukasa Hori(Osaka University), Sumii Masairo(Osaka University), Kho Fujiwara(Kobe university), Makoto Tsubokura(Kobe university), Fumiteru Akamatsu(Osaka University)

Abstract(Click to expand)

A Computational Fluid Dynamics (CFD) code has been developed in order to carry out a diesel engine simulation in consideration of chemical kinetics in short computational time. OpenFOAM-2.4 is used as base code. To reduce the computational time for the reaction mechanism, the reaction mechanism of diesel fuel is modeled by the skeletal mechanism of n-tridecane which consists of 49 chemical species and 85 reactions. Furthermore, LSODES, where is a sparse matrix version of the Livermore Solver for Ordinary Differential Equations (LSODE), is incorporated in this code as ODE solvers for solving reaction. We also incorporate the moving grid for piston compression and decompression. The fuel injection is modeled by DDM (Discrete Droplet Method). The droplet breakup is modeled by WAVEMTAB. The wall heat transfer model is Amsden. The simulation was done by using the Reedbush-U supercomputer in Tokyo university. Free and wall impingement sprays in the constant volume vessel were simulated to validate the code. The results show that the significant computational time reduction is obtained by using LSODES and parallel computing in comparison with the conventional simulation using VODE (a Variable coefficient ODE solver) and serial computing. Furthermore, there is also a good agreement between the fuel spray penetrations of the simulation and the experiment when we use the finer grid. Then, we simulated the diesel engine combustion with varying the injection pressure. The results show that the trend of in-cylinder pressure profile of the computation is in good agreement with that of experiment.
72 Large eddy simulation of turbulent combustion flows in an industrial gasturbine combustor by multi-scalar flamelet approach model

Nobuyuki Oshima(Hokkaido University), Ryosuke Kishine(Hokkaido University), Takeo Oda(Kawasaki Heavy Industires Ltd.)

Abstract(Click to expand)

Gasturbine is an energy resource equipment for electric power plants and major industries, a performance improvement of which may have an important contribution for sustainable global environment. Therefore not only its detail design optimization and also a conceptual improvement are investigated, where a conversion from hydro-carbon fossil fuels to hydrogen (or hydrogen-rich alternative fuel), is a feasible approach to reduce CO2 exhaust without loss of performance. For promoting such a new conceptual design, an upstream loading of numerical predictions which should be applied both to an optimized design and to a concept validation before and after production and/or operation.A turbulent combustion in gasturbine combustor is a most difficult element to predict by a traditional numerical model, because of its complex phenomena coupling with turbulence flow and chemical reaction processes. LES with the multiple-scalar flamelet approach is performed for an industrial gasturbine combustor, which can appropriately simulate a partial premixed combustion of methane-hydrogen mixed fuel operation by using a multiple-scalar flamelet approach model newly developed by authors. Its instantaneous and time-averaged data by LES result are also analyzed for predict NO production in thermal and prompt regime to compare with experimental data. These analyses reveal different mechanisms of turbulent fluctuations to increase or decrease NO productions. The above numerical simulations were performed by high-performance supercomputers for LES and local cloud system for NO analysis, which enable these different levels of program development and operation for treating multi-disciplinary complex design.
73 Fully Coupled Simulation of Coal Gasification System Using LES based Solver for Combustion and Thermal Conduction Solver in Vessel

Tomonori Yamada(The University of Tokyo), Naoto Mitsume(The University of Tokyo), Hiroaki Watanabe(Kyushu University), Ryoichi Kurose(Kyoto University), Hideaki Uchida(The University of Tokyo), Shinobu Yoshimura(The University of Tokyo)

Abstract(Click to expand)

We are working on one of the nine priority issues of the FLAGSHIP 2020 project, i.e. Priority Issue 6: Accelerated Development of Innovative Clean Energy Systems. As one of the four target energy systems, we have been conducting researches on coal gasification plants. The coal gasification process is one of the key technologies to drastically reduce CO2 emission from coal fired power generation. Coal is crushed into fine particulate matter and partially burned into gas in a high-pressure and elevated-temperature environment. It results in turbulent combustion flow with particulate matter upwards. At the same time, chemical components such as silica and sulfur contained in coal are turned into melting matter called slug and are taken away downwards through slug hall. Such complex and highly nonlinear multi-scale and multi-scale physics phenomena must proceed continuously for smooth operation of the plant. It is very difficult and time-consuming to find appropriate design and operation conditions only using experimental approaches. Then, to reproduce the multi-physics and multi-scale phenomena precisely, we have been developing a large scale two-way coupled simulation of thermo-combustion-fluid-melting-structure interaction of a full-scale gasification system of lab-scale by integrating multiple independent parallel solvers well tuned for the K computer and being tuned towards the Post-K computer. The phenomena of thermo-combustion-fluid-melting in the gasification zone is solved using a unstructured large-eddy simulation (LES) solver named FFR-Comb. In the solver, the compressible reacting flow equations with two-way coupling between the continuous phase (gas phase) and dispersed phase (particulate matter of coal) are solved, while the melting flow of the slug are solved using a level-set method. The thermal conduction in the vessel zone is solved using a hierarchical domain decomposition method (HDDM) based thermal conduction solver, ADVENTURE_Thermal. In addition, to simulate the processes of cooling pipes embedded in the vessel accurately, we model heat transfer in each pipe as a one-dimensional (1D) convection-diffusion equation, and develop a discontinuous Galerkin based solver. These 3D and 1D solvers are coupled by a staggered coupling scheme with a subcycling technique to deal with different time increments in the 3D and 1D analyses. Furthermore, the two-way coupled heat transfer between the combustion zone solved by FFB-Comb and the thermal conduction zone in the vessel solved by ADVENTURE_Thermal is dealt with the parallel coupling tool named REVOCAP_Coupler. In this presentation, we present the latest development and achievement of the above multi-scale and multi-physics simulation.
74 Numerical analysis of pulverized coal combustion

Masaya Muto(Meijo University), Hiroaki Watanabe(Kyushu University), Ryoichi Kurose(Kyoto University)

Abstract(Click to expand)

In the field of combusting flow, the development of the combustion technology meeting the demands such as reducing environmental pollutants has been conducted by many researchers and developers. To make a further improvement, an understanding of the flow field and chemical reaction processes in the combustion furnace is necessary. However, it is difficult to obtain such information solely on the basis of experiments. Applying numerical simulation to combustion phenomena using detailed chemical reaction mechanism or applying numerical simulation to flow field inside large scale furnace in order to investigate combustion phenomena is challenging from the viewpoint of computational cost even now. In this study, an ignition process in pulverized coal combustion is investigated by a 2D direct numerical simulation (DNS) with a detailed chemical reaction mechanism. And also, NOx formation process in pulverized coal flame inside multi-burner pulverized coal combustion furnace is investigated by a 3D large-eddy simulation (LES) with a two-step global chemical reaction scheme. In 2D-DNS, the ignition phenomena occurring in a mixing layer is investigated. The results show that the ignition occurs in the rich mixture fraction condition, as the gas temperature is also high in this condition because of the high initial temperature set for the central coal particles. However, once ignition occurs, the combustion reaction dramatically takes place in the condition that equivalence ratio is slightly lean value compared with the stoichiometric value and shows the shortest ignition delay time in the calculation for autoignition of homogeneous mixtures of volatile matter and air. In 3D-LES, the effects of an in-furnace blending method, in which different kinds of coal (high volatile coal and low volatile coal) are injected at each burner stage, on NOx emission is investigated. The result shows that oxygen is rapidly consumed and NOx decreases because the reducing atmosphere becomes dominant due to the lack of oxygen near the burner from which the low volatile coal is injected. These information on the ignition process using detailed chemical reaction mechanism or the NOx formation process inside large scale furnace can be captured in unsteady simulation using large scale computer like present study.

Program

Feb 18, 2019 (Day 1)

Registration

Opening

Introduction of Symposium

Welcome Address

Message from MEXT (Chair: Yutaka Ishikawa)

Japan’s Policy in Promotion of High Performance Computing

Report from R-CCS (Chair: Makoto Tsubokura)

K and Post-K

Invited Talk (Chair: Makoto Tsubokura)

Turbulent Combustion Simulation and In Situ Analytics on Titan and Summit with S3D-Legion

Group Photo

Lunch

Distinguished Achievements in K and related to Post-K (1) (Chair: Takemasa Miyoshi)

Beating heart simulation driven by three dimensional molecular dynamics model

Using Artificial Intelligence and Transprecision Computing for Accelerating Finite-Element Urban Earthquake Simulation

Advancement of meteorological and global environmental predictions utilizing observational “Big Data”

Break

Distinguished Achievements in K and related to Post-K (2) (Chair: Takahito Nakajima)

Accelerated Development of Innovative Clean Energy Systems : Post-K Project Priority Issue 6

Massively parallel density matrix renormalization group method algorithm for two-dimensional strongly correlated systems and its applications

Validation of alternative technology by direct turbulence simulation for towing tank experiment

Break

Panel Discussion

Panel Discussion: From K to Post-K

Poster

See accepted posters below.

Reception

Feb 19, 2019 (Day 2)

Registration

Invited Talk (Chair: Florence Tama)

HPC for biomaterials: why playing soccer hurts

Exascale Computing, Artificial Intelligence and Cancer

BREAK

Distinguished Achievements in K and related to Post-K (3) (Chair: Yasumichi Aoki)

Revealing Drug-Target Binding Pathway using Two-dimensional Replica-Exchange Molecular Dynamics Method

First-principles sampling simulation approaches to battery science and technology

Statistical Computational Cosmology with Big Astronomical Data

Lunch

Invited Talk (Chair: Mitsuhisa Sato)

Tianhe-3 and the Exascale Road in China

EPI, Europe strikes back on HPC

Arm in HPC

BREAK

Panel Discussion

Towards Arm eco-system

Closing

List of Accepted Posters

Society with health and longevity

Marco Capuccini(Uppsala University)

Chigusa Kobayashi(Riken center for computational science), Yasuhiro Matsunaga(Riken center for computational science), Jaewoon Jung(Riken center for computational science), Yuji Sugita(Riken center for computational science)

Hiraku Oshima(RIKEN), Suyong Re(RIKEN), Yuji Sugita(RIKEN)

Kento Kasahara(RIKEN), Hiraku Oshima(RIKEN), Grzegorz Nawrocki(Michigan State University), Isseki Yu(Maebashi Institute of Technology), Suyong Re(RIKEN), Michael Feig(Michigan State University), Yuji Sugita(RIKEN)

Hiroshi Yamaura(The University of Electro-Communications), Jun Igarashi(Head Office for Information Systems and Cybersecurity, RIKEN), Tadashi Yamazaki(The University of Electro-Communications)

Ai Shinobu(RIKEN Center for Computational Science), Yasuhiro Matsunaga(RIKEN Center for Computational Science), Chigusa Kobayashi(RIKEN Center for Computational Science), Yuji Sugita(RIKEN Center for Computational Science)

Yasuhiro Matsunaga(RIKEN), Yuji Sugita(RIKEN)

Zhe Sun(RIKEN), Jun Igarashi(RIKEN)

Jaewoon Jung(RIKEN), Chigusa Kobayashi(RIKEN), Takaharu Mori(RIKEN), Yuji Sugita(RIKEN)

Kundan Kumar(Indian Institute of Science), Saurabh Kumar Gupta(Indian Institute of Science), Chetan Singh Thakur(Indian Institute of Science)

Saurabh Kumar Gupta(Indian Institute of Science, Bangalore), Kundan Kumar(Indian Institute of Science`), Chetan Singh Thakur(Indian Institute of Science)

Jun Igarashi(RIKEN), Hiroshi Yamaura(The University of Electro- Communications), Tadashi Yamazaki(The University of Electro- Communications)

Kentaro Nomura(RIKEN Center for Computational Science), Yutaka Maruyama(RIKEN Center for Computational Science), Keigo Nitadori(RIKEN Center for Computational Science), Jun Makino(RIKEN Center for Computational Science)

Disaster prevention and global climate

Koji Terasaki(RIKEN, R-CCS), Takemasa Miyoshi(RIKEN, R-CCS)

Hazuki Arakida(RIKEN), Shunji Kotsuki(RIKEN), Shigenori Otsuka(RIKEN), Yohei Sawada(Meteorological Research Institute, RIKEN), Takemasa Miyoshi(RIKEN)

Kohei Takatama(RIKEN), Kenta Kurosawa(RIKEN), Yusuke Uchiyama(Kobe University), Takemasa Miyoshi(RIKEN)

Yasumitsu Maejima(RIKEN), Shigenori Otsuka(RIKEN), Takemasa Miyoshi(RIKEN)

James Taylor(RIKEN), Guo-Yuan Lien(RIKEN), Shinsuke Satoh(NICT), Takemasa Miyoshi(RIKEN), Yasumitsu Maejima(RIKEN)

Takumi Honda(RIKEN), Guo-Yuan Lien(CWB), Takemasa Miyoshi(RIKEN)

Shigenori Otsuka(RIKEN), Taeka Awazu(KS Solutions), Takemasa Miyoshi(RIKEN)

Kenta Kurosawa(RIKEN), Shunji Kotsuki(RIKEN), Takemasa Miyoshi(RIKEN)

Takashi Sekiya(JAMSTEC), Kazuyuki Miyazaki(JAMSTEC/JPL-CalTech), Koji Ogochi(JAMSTEC), Kengo Sudo(Nagoya University/JAMSTEC), Masayuki Takigawa(JAMSTEC), Henk Eskes(KNMI), Folkert Boersma(KNMI)

Marimo Ohhigashi(RIKEN), Shunji Kotsuki(RIKEN), Shohei Takino(Tokyo Electric Power Company Holdings), Takemasa Miyoshi(RIKEN)

Takamasa Iryo(Kobe University), Kazuki Fukuda(Kobe University), Junji Urata(The University of Tokyo), Genaro Jr. Peque(Kobe University), Lalith Wijerathne(The University of Tokyo), Wasuwat Petprakob(The University of Tokyo)

Genaro Jr Peque(Kobe University), Hiro Harada(Kobe University), Takamasa Iryo(Kobe University)

Sachiho Adachi(RIKEN), Seiya Nishizawa(RIKEN), Kazuto Ando(RIKEN), Tsuyoshi Yamaura(RIKEN), Ryuji Yoshida(NOAA Earth System Research Laboratory (ESRL)), Hisashi Yashiro(RIKEN), Yoshiyuki Kajikawa(RIKEN), Hirofumi Tomita(RIKEN)

Yoshiyuki Kajikawa(RIKEN), Kazuto Ando(RIKEN), Sachiho Adachi(RIKEN), Seiya Nishizawa(RIKEN), Tsuyoshi Yamaura(RIKEN)

Hisashi Yashiro(RIKEN Center for Computational Science), Hirofumi Tomita(RIKEN Center for Computational Science)

Hiromu Seko(Meteorological Research Institute, Japan Agency for Marine-Earth Science and Technology), Wataru Mashiko(Meteorological Research Institute), Sho Yokota(Meteorological Research Institute), Tetsurou Tamura(Tokyo Institute of Technology), Hiroshi Niino(The University of Tokyo)