Invited Speakers
Keynote Speakers
Invited Speakers
Satoshi Matsuoka
Director, R-CCS, RIKEN
Fugaku – A Centerpiece for the Japanese Society 5.0
Abstract
Fugaku is not only one of the first ‘exascale’ supercomputer of the world, but also is slated to be the centerpiece for rapid realization of the so-called Japanese ‘Society 5.0’ as defined by the Japanese S&T national policy. Indeed, the computing capacity of Fugaku is massive, almost equaling the compute capabilities of the aggregate of all the servers (including those in the cloud) facilitated in Japan (approximately 300,000 units), but at the same time, is a pinnacle of the Arm ecosystem, being software compatible with billions of Arm processors sold worldwide in smartphones to refrigerators, and will run standard software stack as is the case for x86 servers. As such, Fugaku’s immense power is directly applicable not only to traditional scientific simulation applications, but can be a target of Society 5.0 applications that encompasses conversion of HPC & AI & Big Data as well as Cyber (IDC & Network) vs. Physical (IoT) space, with immediate societal impact. A series of projects and developments have started at R-CCS and our partners to facilitate such Society 5.0 usage scenarios on Fugaku.
Biography
Ph. D. from the University of Tokyo in 1993. A Full Professor at the Global Scientific Information and Computing Center (GSIC), the Tokyo Institute of Technology since 2000, and the director of the joint AIST-Tokyo Tech. Real World Big Data Computing Open Innovation Laboratory (RWBC-OIL) since 2017. Director at R-CCS along with Specially Appointed Professor duty at Tokyo Tech starting 2018. The leader of the TSUBAME series of supercomputers won world #1 in power-efficient computing. Various major supercomputing research projects in areas such as parallel algorithms and programming, resilience, green computing, and convergence of big data/AI with HPC. Written over 500 articles, and chaired numerous ACM/IEEE conferences, including the Program Chair at the ACM/IEEE Supercomputing Conference (SC13). As a Fellow of the ACM and European ISC, won many awards, including the JSPS Prize from the Japan Society for Promotion of Science in 2006, presented by his Highness Prince Akishino; the ACM Gordon Bell Prize in 2011; the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2012; the 2014 IEEE-CS Sidney Fernbach Memorial Award, the highest prestige in the field of HPC; and recently HPDC 2018 Achievement Award from ACM.
Jeffrey Vetter
Oak Ridge National Laboratory
Preparing for Extreme Heterogeneity in High Performance Computing
Abstract
While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as heterogeneous cores, deep memory hierarchies, non-volatile memory (NVM), and near-memory processing, have emerged as possible solutions to address the concerns of energy-efficiency and cost. However, we expect this ‘golden age’ of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. Software will need to be redesigned to exploit these new capabilities and provide some level of performance portability across these diverse architectures. In this talk, I will sample these emerging technologies, discuss their architectural and software implications, and describe several new approaches (e.g., domain specific languages, intelligent runtime systems) to address these challenges.
Biography
Jeffrey Vetter, Ph.D., is a Distinguished R&D Staff Member at Oak Ridge National Laboratory (ORNL). At ORNL, Vetter is the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division, and the founding director of the Experimental Computing Laboratory (ExCL). Vetter earned his Ph.D. in Computer Science from the Georgia Institute of Technology. Vetter is a Fellow of the IEEE, and a Distinguished Scientist Member of the ACM. In 2010, Vetter, as part of an interdisciplinary team from Georgia Tech, NYU, and ORNL, was awarded the ACM Gordon Bell Prize. In 2015, Vetter served as the SC15 Technical Program Chair. His recent books, entitled "Contemporary High Performance Computing: From Petascale toward Exascale (Vols. 1-3)," survey the international landscape of HPC. Learn more at https://ft.ornl.gov/~vetter/.
Evelyne Foerster
CEA
Modelling strategies for Nuclear Probabilistic Safety Assessment in case of natural external events
Abstract
The methodology for Probabilistic Safety Assessment (PSA) of Nuclear Power Plants (NPPs) is used to better understand the most probable initiators of nuclear accidents by identifying potential accident scenarios, their consequences, and their probabilities. However, in case of external hazards events (earthquakes, tsunamis, flooding, extreme weather…), the challenge is to dispose of efficient tools and strategies to estimate possible consequences on the NPP Systems, Structures & Components (SSCs), combined with a suitable uncertainty treatment, to support the risk-informed decision making related to the plant operating states. In this presentation, we propose to give an overview of some modelling strategies (e.g. HPC, model reduction) being developed to assess the probabilistic SSCs and plant response to external events.
Biography
Dr. Evelyne Foerster is currently the Head of the of the Seismic Mechanics Study (EMSI) Lab at CEA (http://www.cea.fr/english), Director of the French SEISM Institute at Paris-Saclay University and Coordinator of the H2020 NARSIS project (http://www.narsis.eu/).
She has contributed broadly to research and developments related to numerical methods and computing tools for geohazards and civil engineering, including high performance computing for large-scale risk assessment, and has extensive experience in national and European research projects coordination (https://www.linkedin.com/in/evelyne-foerster-6b445b13/ , https://www.researchgate.net/profile/Evelyne_Foerster).
Norman Christ
Columbia University
Lattice QCD at the Exascale
Abstract
The strong interactions of the quarks and gluons which make up the atomic nucleus can now be described with 1% accuracy by solving the fundamental equations of quantum chromodynamics (QCD) using high-performance computers. Continued innovation in numerical algorithms and the huge advance in computer technology have vastly expanding the physics reach of these lattice QCD calculations. We will discuss some of these algorithms, the challenges posed to lattice QCD by current supercomputer architecture and examples of new physics directions that can now be explored using lattice QCD.
Biography
Prof. Christ began work on lattice QCD in the 1980's, designing and constructing some of the first parallel computers, purpose-built for lattice QCD, a design direction culminating in the series of IBM Blue Gene machines. His physics calculations have evolved from studying QCD at finite temperature on the earliest QCD computers to increasingly sophisticated calculations, with his collaborators in the RBC and UKQCD Collaborations, focused on CP violation in the standard model of particle physics and other searches for new phenomena enabled by lattice QCD calculations of the standard model predictions for rare processes, highly sensitive to new physics.
Takaki Hatsui
SPring-8, RIKEN
New opportunities in photon science with high-speed X-ray imaging detector Citius, and associated data challenge
Abstract
X-ray science based on large-scale accelerators, which is often called photon science, are now entering new era owing to the development of highly-brilliant synchrotron radiation sources based on multi-bend acromat lattice, and high-speed X-ray imaging detectors. Among these detectors, Citius is one of the first detectors developed for such purpose. The raw data rate will reach 10 Tbps when the system is composed to form 20 Mpixels (30 x 30 cm2) operating at 17 kfps. Such data stream will be processed by a data processing pipeline.
Biography
Dr. Takaki Hatsui received Ph. D degree in science from Graduate School for Advanced Studies in 1999 on soft x-ray studies by using synchrotron radiation. After working as a Postdoctoral Fellow, he joined Institute for Molecular Science as a research associate and an assistant professor working in the field of soft-x-ray science and instrumentation. Since 2007, he jointed to XFEL Project Head Office of RIKEN leading developments on detectors and data acquisition system. In 2013, he started developments of X-ray imaging detectors for synchrotron radiation facilities.
Jean-Marc DENIS
European Processor Initiative
EPI: the European approach for Exascale ages
Abstract
The rise of artificial intelligence in HPC, associated to the data deluge, combined to the transition from monolithic applications toward complex workflows lead the HPC community, especially the hardware architects to reconsider how the Exascale supercomputers are designed. Furthermore, the advent of hybrid computing, defined as the combination of in-house with cloud computing significantly impacts design metrics.
In this presentation, the transition from existing homogenous to Exascale class modular architectures is discussed. The consequences on the compute components including the general purpose processor and the associated accelerators is addressed. Ultimately, all these consideration lead to the guidelines having ruled the design of the European microprocessor that will empower the European Exascale supercomputers.
Biography
After five years of research in the development of new solvers for the for Maxwell equations at Matra Defense (France) as mathematician from 1990 to 1995, Jean-Marc Denis had several technical position in the HPC industry between 1995 to 2004 from HPC pre-sales to Senior Solution Architect. Since 2004 Jean-Marc has worked at Bull SAS head Quarter (France) where he has started the HPC activity with a few other people. At that time, Bull was not present in the HPC sector.
In less than 10 years, the HPC revenue at Bull exploded from nothing in 2004 to 200M€ in 2015, making Bull the undisputed leader of the European HPC industry and the fourth in the world.
From 2011 to the end of 2016, Jean-Marc has leaded the worldwide business activity with the goal to consolidate the ATOS/Bull position in Europe and to make ATOS/Bull a worldwide leader in Extreme Computing with footprint in Middle-East, Asia, Africa and South America.
In 2016 and 2017, Jean-Marc has been in charge of the definition of the strategy for the BigData Division at ATOS/Bull. In his position, his role is to define the global approach for the different BigData business lines covering HPC, Legacy (mainframe), Entreprise computing, DataScience consulting and Software.
Since the beginning of 2018, Jean-Marc is the head of Strategy and Plan at Atos/Bull, in charge of the global cross-Business Unit Strategy and of the definition of the 3 years business plan.
Since the middle of 2018, Jean-Marc has been also elected as Chair of the Board of the European Processor Initiative (EPI).
The general objective of the European Processor Initiative (EPI) partnership is to design a roadmap for future European low power processors for extreme scale computing, high-performance big-data and emerging applications like automotive and other fields that require a highly efficient processing infrastructure.
More precisely, EPI aims at establishing a roadmap to reach three fundamental goals:
1) Developing low-power processor technology to be included in the European Exascale supercomputers in 2023-2024;
2) Ensuring that a significant part of that technology is European;
3) Ensuring that the application areas of the technology are not limited only to HPC, but cover other areas, thus ensuring the economic viability of the initiative.
Ding Zhaohui
Huawei HPC Lab
(Remote Presentation)
Huawei’s ARM HPC Software Update
Abstract
The topic will introduce the technical highlights and the roadmap of Huawei's HPC software stack. The software stack includes: the UCG framework proposed by Huawei and has been contributed to OpenUCX community, under the UCG framework, the ARM-optimized collective communication algorithms are implemented; the HCC (Huawei Cloud Compiler) and the MAL (Math Acceleration Library) with the optimization features for ARM-based HPC; the Donau Scheduler (Huawei’s homegrown cluster scheduler) for the new HPC workloads.
Biography
Zhaohui Ding obtained a PhD from the college of computer science of Jilin University, China in 2009. Zhaohui had researched grid computing at SDSC as visiting scholar, then he joined Platform Computing Inc. (acquired by IBM in 2012) and contributed to multiple generations of LSF products. He currently is the director of HPC Lab at Huawei, and is responsible for leading the research and development of HPC software stack. In his career, he has published over ten scholarly publications in peer-reviewed setting.
Toshiyuki Shimizu
Fujitsu
Supercomputer "Fugaku" optimized for application performance and energy efficiency
Abstract
Achievements and status of Supercomputer "Fugaku", formerly known as Post-K, Japan’s new national supercomputer, will be discussed. Fugaku targets up to 100 times higher application performance than that of K computer, with superior power efficiency. Fugaku employs the newly developed FUJITSU A64FX CPU featuring Armv8-A instruction set architecture and the Scalable Vector Extension (SVE) to widen application opportunities. Fugaku contributes to the Arm ecosystem for HPC applications as well as science and society.
Biography
Mr. Toshiyuki Shimizu is Senior Director, Platform Development Unit, at Fujitsu Limited. Mr. Shimizu has been deeply and continuously involved in the development of scalar parallel supercomputers, large SMP enterprise servers, and x86 cluster systems. His primary research interest is in interconnect architecture, most recently culminating in the development of the Tofu interconnect for the K computer and PRIMEHPC series.
He leads the development of Fujitsu’s high-end supercomputer PRIMEHPC series and the Fugaku supercomputer formerly known as Post-K. Mr. Shimizu received his Masters of Computer Science degree from Tokyo Institute of Technology in 1988.
Mike Heroux
Sandia National Laboratories
(Remote Presentation)
The Extreme-scale Scientific Software Stack for Collaborative Open Source Software
Abstract
Open source, community-developed reusable scientific software represents a large and growing body of capabilities. Linux distributions, vendor software stacks and individual disciplined software product teams provide the scientific computing community with usable holistic software environments containing core open source software components. At the same time, new software capabilities make it into these distributions in a largely ad hoc fashion.
The Extreme-scale Scientific Software Stack (E4S),first announced in November 2018, along with its community-organized scientific software development kits (SDKs), is a new community effort to create lightweight cross-team coordination of scientific software development, delivery and deployment and a set of support tools an processes targeted at improving scientific software quality via improved practices, policy, testing and coordination.
E4S (https://e4s.io), which announced the release of Version 1.0 in November 2019, is an open architecture effort, welcoming teams that are developing technically compatible and high-quality products to participate in the community. E4S and the SDKs are sponsored by the US Department of Energy Exascale Computing Project (ECP), driven by our need to effectively develop, test, deliver and deploy our open source software products on next generation platform to the scientific community.
In this presentation, we introduce E4S, discuss its design and implementation goals and show examples of success and challenges so far. We will also discuss our connection with other key community efforts we rely upon for our success and describe how collaboration around E4S can be realized.
Biography
Michael Heroux is a Senior Scientist at Sandia National Laboratories, Director of SW Technologies for the US DOE Exascale Computing Project (ECP) and Scientist in Residence at St. John’s University, MN. His research interests include all aspects of scalable scientific and engineering software for new and emerging parallel computing architectures.
He leads several projects in this field: ECP SW Technologies is an integrated effort to provide the software stack for ECP. The Trilinos Project (2004 R&D 100 winner) is an effort to provide reusable, scalable scientific software components. The Mantevo Project (2013 R&D 100 winner) is focused on the development of open source, portable mini-applications and mini-drivers for the co-design of future supercomputers and applications. HPCG is an official TOP 500 benchmark for ranking computer systems, complementing LINPACK.
Subhasish Mitra
Stanford University
(Remote Presentation)
Abundant-Data Computing: The N3XT 1,000X
Abstract
The world’s appetite for analyzing massive amounts of data is growing dramatically. The computation demands of these abundant-data applications, such as machine learning, far exceed the capabilities of today’s computing systems, and can no longer be met by isolated improvements in transistor technologies, memories or integrated circuit architectures alone. One must create transformative NanoSystems which exploit unique properties of underlying nanotechnologies to implement new architectures.
This talk will present the N3XT (Nano-Engineered Computing Systems Technology) approach to such NanoSystems through: (i) new computing system architectures leveraging emerging device (logic and memory) nanotechnologies and their dense 3D integration with fine-grained connectivity for computation immersed in memory, (ii) new logic devices (such as carbon nanotube field-effect transistors for implementing high-speed and low-energy logic circuits) as well as high-density non-volatile memory (such as resistive RAM that can store multiple bits inside each memory cell), amenable to (iii) ultra-dense (monolithic) 3D integration of thin layers of logic and memory devices that are fabricated at low temperature.
A wide variety of N3XT hardware prototypes (built in commercial and research facilities) represent leading examples of transforming scientifically-interesting nanomaterials and nanodevices into actual NanoSystems. N3XT NanoSystems target 1,000X system-level energy-delay product benefits especially for abundant-data applications. Such massive benefits enable a wide range of applications that push new frontiers, from deeply-embedded computing systems all the way to very large-scale systems.
Biography
Subhasish Mitra is Professor of Electrical Engineering and of Computer Science at Stanford University. He directs the Stanford Robust Systems Group, co-leads the Computation focus area of the Stanford SystemX Alliance, and is a faculty member of the Wu Tsai Neurosciences Institute. Prof. Mitra also holds the Carnot Chair of Excellence in NanoSystems at CEA-LETI in Grenoble, France. His research ranges across robust computing, NanoSystems, Electronic Design Automation, and neurosciences. Results from his research group have been widely deployed by industry and have inspired significant development efforts by government and research organizations in multiple countries.
Jointly with his students and collaborators, Prof. Mitra demonstrated the first carbon nanotube computer and the first three-dimensional NanoSystem with computation immersed in data storage. These demonstrations received wide-spread recognition: cover of NATURE, Research Highlight to the United States Congress by the National Science Foundation, and highlight as "important, scientific breakthrough" by news organizations around the world.
In the field of robust computing, Prof. Mitra and his students created key approaches for soft error resilience, circuit failure prediction, on-line self-test and diagnostics, and QED (Quick Error Detection) design verification and system validation. His earlier work on X-Compact test compression at Intel Corporation has proven essential to cost-effective manufacturing and high-quality testing of almost all electronic systems across the industry. X-Compact and its derivatives have been implemented in widely-used commercial Electronic Design Automation tools.
In the field of robust computing, Prof. Mitra and his students created key approaches for soft error resilience, circuit failure prediction, on-line self-test and diagnostics, and QED (Quick Error Detection) design verification and system validation. His earlier work on X-Compact test compression at Intel Corporation has proven essential to cost-effective manufacturing and high-quality testing of almost all electronic systems across the industry. X-Compact and its derivatives have been implemented in widely-used commercial Electronic Design Automation tools.
Prof. Mitra's honors include the ACM SIGDA / IEEE CEDA Newton Technical Impact Award in Electronic Design Automation (a test of time honor), the Semiconductor Research Corporation's Technical Excellence Award (for innovation that significantly enhances the semiconductor industry), the Intel Achievement Award (Intel’s highest corporate honor), and the United States Presidential Early Career Award for Scientists and Engineers from the White House. He and his students have published award-winning papers at major venues: ACM/IEEE Design Automation Conference, IEEE International Solid-State Circuits Conference, ACM/IEEE International Conference on Computer-Aided Design, IEEE International Test Conference, IEEE Transactions on CAD, IEEE VLSI Test Symposium, and the Symposium on VLSI Technology. At Stanford, he has been honored several times by graduating seniors "for being important to them during their time at Stanford."
Prof. Mitra has served on the Defense Advanced Research Projects Agency's (DARPA) Information Science and Technology Board as an invited member. He is a Fellow of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE).
Ahmed Hemani
Royal Institute of Technology
Synchoros VLSI Design Style: A Solution for Post Moore Era
Abstract
In post Moore era, we will need to efficiently create fully customized complex machines that exploits all available technology options to go beyond what Moore’s law has been enabling us with. This would include use of memristor based computation in memory, plasmonics, 3D DRAM vauts combined with conventional CMOS. Conventional standard cell based design flows will not suffice to deal with such diversity and complexity. We propose a new design style called synchoros VLSI design style to replace standard cells based VLSI design flows. Synchoricity, derived from the Greek word choros for space is the spatial analog of synchronicity. In synchronicity, we discretize time to enable temporal composition by abutment. In synchoricity, we discretize space to enable spatial composition by abutment. Synchoros VLSI design style has been used to design Lego like bricks called SiLago that replaces conventional standard cells as the atomic building blocks. SiLago blocks compose by abutment to create valid ready-to-manufacture designs without the end-user having to do logic and physical synthesis. We show, how synchoros SiLago based platform can automate custom hybrid designs to cope with both diversity of technologies and also deal with extreme complexity.
Biography
Ahmed Hemani is Professor in Electronic Systems Design at School of ICT, KTH, Kista, Sweden. His current areas of research interests are massively parallel architectures and design methods and their applications to scientific computing and autonomous embedded systems inspired by brain. In past he has contributed to high-level synthesis – his doctoral thesis was the basis for the first high-level synthesis product introduced by Cadence called visual architect. He has also pioneered the Networks-on-chip concept and has contributed to clocking and low power architectures and design methods. He has extensively worked in industry including National Semiconductors, ABB, Ericsson, Philips Semiconductors, Newlogic. He has also been part of three start-ups.
Haohuan Fu
Tsingha University/Wuxi supercomputing center
(Remote Presentation)
Super AI on Supercomputers
Abstract
As a key tool in solving major science and engineering challenges, supercomputers have long been considered an important national information infrastructure and an indicator of a country's technology and innovation capabilities. In this talk, we will introduce Sunway TaihuLight, which was the world's first system with a peak performance greater than 100 PFlops, and a parallel scale of over 10 million cores. Different from other existing heterogeneous supercomputers, the system adopts its unique design strategies in both the architecture of its 260-core Shenwei CPU and its way of integrating 40,960 such CPUs as 40 powerful cabinets. This talk would first introduce and discuss design philosophy about the approach to integrate these 10 million cores. Then we describe how we utilize these 10 million cores to contribute, to both science and economy. We can perform simulations with unprecedented scale and resolutions, so as to study the climate change, to enable intelligent agriculture management, and to evaluate natural risks that we are facing. The strong computing power also enables various forms of artificial intelligence and brought values to various industries that range from new energy to new medicine.
Biography
Haohuan Fu is a professor in the Ministry of Education Key Laboratory for Earth System Modeling, and Department of Earth System Science in Tsinghua University, where he leads the research group of High Performance Geo-Computing (HPGC). He is also the deputy director of the National Supercomputing Center in Wuxi, leading the research and development division. Fu has a PhD in computing from Imperial College London. His research work focuses on providing both the most efficient simulation platforms and the most intelligent data management and analysis platforms for geoscience applications, leading to two consecutive winning of the ACM Gordon Bell Prizes (nonhydrostatic atmospheric dynamic solver in 2016, and nonlinear earthquake simulation in 2017).
Andy Hock
PhD/Director, Cerebras Systems, Inc.
Supercomputer-Scale AI with Cerebras Systems
Abstract
AI has great potential for society, but is compute-limited today. Researchers continue to make progress in deep learning by training larger models with larger datasets. But training times for these models often take days or weeks -- this is costly and constrains development. And this challenge is growing. We need a new type of processor to accelerate computation for deep learning and AI.
In this talk we will discuss the Cerebras approach to speed up training and reduce time to solution with the Cerebras Wafer Scale Engine -- the largest chip in the world -- and the CS-1 system for the datacenter. This accelerator provides cluster-scale compute resources in a single server, easily programmable with current ML frameworks -- it is a platform designed to accelerate AI compute by orders of magnitude, and enable faster approach to a smarter "Society 5.0."
Biography
Dr Andy Hock is the Director of Product at Cerebras Systems, a company building a new class of computer system to accelerate deep learning and artificial intelligence by orders of magnitude beyond the traditional processors of today.
Andy came to Cerebras from Google, where he led Data and Analytics Product for the Terra Bella project, using deep learning and AI to create useful data for maps and enterprise applications from satellite imagery.
While there, Andy observed both great potential for AI across many applications -- from computer vision to language and sequential data processing, to search and recommendation as well as fundamental research for health and science -- but also a great challenge that AI computing takes too long. Modern deep learning models often take days or weeks of compute time to train today, even on large clusters of graphics processors. We need a better compute solution to unlock AI research and bring better solutions to market sooner.
At Cerebras, Andy saw an opportunity to help solve this problem and deliver the right compute solution for deep learning and AI.
Before Google, Andy was a Senior Scientist and Senior Technical Program Manager at Arete Associates, where he led research for image processing algorithm development. He has a PhD in Geophysics and Space Physics from UCLA and a BA in Astronomy-Physics from Colgate University.
==
LinkedIn: https://www.linkedin.com/in/andyhock