RIKEN Center for Computational Science

Menu
Menu
Events/Documents イベント・広報

R-CCS Cafe

R-CCS Cafe is a place where R-CCS researchers can informally discuss their research beyond the boundary of their discipline to facilitate integration of different disciplines. R-CCS Cafe is held twice a month. All who are interested are welcome to attend.

  • Purpose: To provide a forum for researchers to exchange ideas and information, with the goal to facilitate interdisciplinary collaboration and develop new research fields.
  • Place: Lecture Hall (6th floor) or Seminar Room (1st floor) at R-CCS
  • Language: Presentations will be in Japanese or English. Slides will be in English.

Please make your presentation understandable to researchers in other fields. Questions and active discussion are encouraged.

The 158th R-CCS Cafe-part II
Date and Time: Fri. Jan. 11, 2019, 14:00 - 15:00
Place: Lecture Hall (6th floor) at R-CCS

Title: An Innovative Method for Integration of Simulation/Data/Learning in the Exascale/Post-Moore Era
Speaker: Kengo Nakajima (Deputy Director, R-CCS)

Presentation Language: English
Presentation Material: English

Abstract: Detail

"ppOpen-HPC" is an open source infrastructure for development and execution of optimized and reliable simulation code on post-peta-scale (pp) parallel computers based on many-core architectures, and it consists of various types of libraries, which cover general procedures for scientific computation. Source code developed on a PC with a single processor is linked with these libraries, and the parallel code generated is optimized for post-peta-scale systems with manycore architectures, such as the Oakforest-PACS system of Joint Center for Advanced High Performance Computing (JCAHPC). "ppOpen-HPC" is part of a five-year project (FY.2011-2015) spawned by the "Development of System Software Technologies for Post-Peta Scale High Performance Computing" funded by JST-CREST. The framework covers various types of procedures for scientific computations, such as parallel I/O of data-sets, matrix-assembly, linear- solvers with practical and scalable preconditioners, visualization, adaptive mesh refinement and dynamic load-balancing, in various types of computational models, such as FEM, FDM, FVM, BEM and DEM. Automatic tuning (AT) technology enables automatic generation of optimized libraries and applications under various types of environments. We release the most updated version of ppOpen-HPC as open source software every year in November (2012-2015), which is available at http://ppopenhpc.cc.u-tokyo.ac.jp/ppopenhpc/ . In 2016, the team of ppOpen-HPC joined ESSEX-II (Equipping Sparse Solvers for Exascale) project (Leading P.I. Professor Gerhard Wellein (University of Erlangen-Nuremberg)), which is funded by JST-CREST and the German DFG priority programme 1648 "Software for Exascale Computing" (SPPEXA) under Japan (JST)-Germany (DFG) collaboration until FY.2018. In ESSEX-II, we develop pK-Open-HPC (extended version of ppOpen-HPC, framework for exa-feasible applications), preconditioned iterative solvers for quantum sciences, and a framework for automatic tuning (AT) with performance model. In the presentation, various types of achievements of ppOpen-HPC, ESSEX-II, and pK-OpenHPC project, such as applications using HACApK library for H-matrix computation, coupling simulations by ppOpen-MATH/MP, and parallel preconditioned iterative solvers will be shown. Supercomputing in the Exa-scale and the Post-Moore Era is inherently different from that in the Peta- scale Era and before. Although supercomputers have been the essential tool for computational science in recent 30 years, they are now used for other purposes, such as data analytics, and machine learning. Architecture of the next generation supercomputing system is essentially heterogeneous for these multiple purposes (simulations + data + learning). We propose a new innovative method for integration of computational and data science (Big Data & Extreme Computing, BDEC) for sustainable promotion of new scientific discovery by supercomputers in the Exa-Scale/Post-Moore Era with heterogeneous architecture. "h3-Open-BDEC (h3: hierarchical, hybrid, heterogeneous,)" is an open source infrastructure for development and execution of optimized and reliable codes for BDEC on such supercomputers, which is the extended version of ppOpen-HPC. In this presentation, we will overview the h3-Open-BDEC, and the target supercomputer system, which will start operation in April 2021.

The 157th R-CCS Cafe-part I
Date and Time: Thu. Jan. 10, 2019, 14:00 - 15:00
Place: Lecture Hall (6th floor) at R-CCS

Title: Multigrid for structured grids on large-scale parallel computers
Speaker: Prof. Dr. Matthias Bolten(University of Wuppertal, High performance computing / software engineering)

Presentation Language: English
Presentation Material: English

Abstract: Detail

In many simulations in computational science and engineering a partial differential equation has to be solved. Multigrid methods are among the fastest methods for accomplishing this task, in many cases with optimal, i.e., O(N), complexity. As a consequence, in simulations of huge problems on large-scale supercomputers often a multigrid method is used. If the underlying problem is formulated on a structured grid, this structure can be exploited in the multigrid method to build up the grid hierarchyd. Additionally, the presence of structure allows for a relatively straightforward efficient implementation on modern computer architectures, like modern CPUs or GPUs. Further, structure allows for a rigorous analysis of the problem and the multigrid method used for solving it. Still, the used multigrid components, i.e., grid transfer operators and smoothers, have to be carefully chosen to be able to treat the underlying problem. Besides the adaption to the problem the chosen component can have a huge influence on the serial efficiency and the parallel scalability of the whole method. In this talk multigrid methods for structured grids, their analysis and the specific choice of algorithmic components for parallel computers will be discussed.

The 157th R-CCS Cafe-part II
Date and Time: Thu. Jan. 10, 2019, 15:20 - 15:50
Place: Lecture Hall (6th floor) at R-CCS

Title: Multiplicative Schwartz type block multi-color Gauss-Seidel smoother for AMG/GMG method
Speaker: Masatoshi Kawai (HPC Usability Team)

Presentation Language: English
Presentation Material: English

Abstract: Detail

In this talk, we will focus on a multigrid method. A convergence and performance of the multigrid method strongly depend on a smoother. We proposed a multiplicative Schwartz type block multi-color Gauss-Seidel(MS-BMC-GS) smoother. This smoother has better convergence, higher cache-hit ratio, and fewer communications compared with existing methods. In this talk, we introduce the MS-BMC-GS smoother and show the numerical evaluations with a geometric and algebraic multigrid method.

The 156th R-CCS Cafe
Date and Time: Fri. Dec. 21, 2018, 15:15 - 16:15
Place: Lecture Hall (6th floor) at R-CCS

Title: Designing Communication Platform for an FPGA Cluster
Speaker: Tomohiro Ueno (Processor Research Team)

Presentation Language: English
Presentation Material: English

Abstract: Detail

A Field programmable gate array (FPGA) is a reconfigurable device on which we can implement arbitrary circuits repeatedly. By optimal implementation and stream processing, FPGA-based computing achieves both high computing performance and high power efficiency. To further improve the performance, it is necessary to realize an FPGA cluster with multiple nodes. We have developed a directly connected FPGA cluster and communication platform for the cluster. In this talk, I introduce the design and structure of the FPGA cluster and how to communicate in the cluster. I also explain the communication modules and actual data movement on the FPGA cluster. Finally, I shows that the proposed platform achieves fast and flexible communication for various application on FPGAs.

The 155th R-CCS Cafe-part I
Date and Time: Fri. Dec. 7, 2018, 13:00 - 14:00
Place: Lecture Hall (6th floor) at R-CCS

Title: Applying HPC to mitigate disaster damage by developing and integrating advanced computational science
Speaker: Satoru Oishi (Team Leader, Computational Disaster Mitigation and Reduction Research Team)

Presentation Language: English
Presentation Material: English

Abstract: Detail

Computational Disaster Mitigation and Reduction Research Team is aimed at developing advanced large-scale numerical simulation of natural disasters such as earthquake, tsunami, flood and inundation, for Kobe City and other urban areas in Hyogo Prefecture. Oishi team integrates geo hazards, water hazards and related hazards. Demand for natural disaster simulations became increasing because disasters frequently take place. Therefore, we are developing appropriate sets of computer programs which meet the demand of calculations. Computational Disaster Mitigation and Reduction Research Team is dealing with the following three kinds of research topics. Urban model development: Research for urban hazards requires urban models which represent structure and shape of cities in numerical form. However, it takes very long time to develop urban models consisting of buildings, foundations and infrastructures like bridges, ports and roads. Therefore, it is indispensable to invent methods which automatically construct urban models from exiting data that is basically ill-structured. Oishi team developed Data Processing Platform (DPP) for such purpose. By using DPP, construction of a national-wide urban model and 3D model construction from engineering drawings are achieved. Recently, Oishi team has a couple of big collaborative researches with Hanshin Expressway Co. Ltd. and National Institute for Land and Infrastructure Management (MLIT). Three dimensional bridge model for programming code will be generated automatically from paper-based engineering drawings or 2D CAD so that Oishi team can simulate the seismic response of the entire network with high fidelity models. Since paper-based engineering drawings include errors and lack of information, it is hopeless to perform a robust model construction by merely extracting information from engineering drawings. To tackle with this problem, Oishi team have developed a template-based methodology. Developing particle methods for landslide simulation using FDPS: Conventional mesh-based numerical methods, such as finite element method (FEM) and finite difference method (FDM) have difficulty to simulate the large deformations, the evolution and break-down of the traction-free-surfaces during a landslide process. On the other hand, meshfree methods, such as smoothed particle hydrodynamics (SPH), and moving particle semi-implicit method (MPS), are regarded as promising candidates for landslide simulations. Using a framework of developing parallel particle simulation code (FDPS), we try to develop a large-scale simulation code for landslide simulation. Since FDPS provides those common routines needed for parallelizing a general particle method, we can focus on the numerical schemes and the mechanisms of landslides. In this talk, we present an improvement of a mathematical reformulation of MPS (iMRMPS). This iMRMPS shows no deterioration of accuracy and convergence for randomly distributed particles, outperforming most conventional particles methods. Water related disaster: Frequency of water disaster has increased. Not only water itself but also sediment cause damage to residents and their assets. Understanding possible hazards is necessary for a measure of precaution and making less damage. Therefore, Oishi team started to deal with water and sediment related disasters by making numerical simulation model for river basins in Kobe city and Hyogo prefecture. Estimation of a damage of sediment-related disaster accompanied with flood, inundation, and sediment supply due to landslides is important to establish a prevention plan. Oishi team has developed a 2D Distributed Rainfall and Sediment Runoff/Inundation Simulator (DRSRIS) with coupling the 2D rainfall runoff model, inundation flow model , and sediment transport model on the staggered grid which performs on the supercomputer.

The 155th R-CCS Cafe-part Ⅱ
Date and Time: Fri. Dec. 7, 2018, 14:00 - 15:00
Place: Lecture Hall (6th floor) at R-CCS

Title: Predictability of the July 2018 Record-breaking Rainfall in Western Japan
Speaker: Takemasa Miyoshi (Team Leader, Data Assimilation Research Team)

Presentation Language: English
Presentation Material: English

Abstract: Detail

Data assimilation combines the computer model simulation and real-world data based on dynamical systems theory and statistical mathematics. Data assimilation addresses predictability of dynamical systems and has long been playing a crucial role in numerical weather prediction. Data Assimilation Research Team (DA Team) has been working on various problems of data assimilation, mainly focusing on weather prediction. In July 2018, a broad area in western Japan was damaged severely due to record-breaking heavy rainfall. DA Team developed real-time regional and global weather forecasting systems and investigated the historic rainfall event using these systems. Also, DA Team took the lead in organizing a rapid-response conference for meteorologist in August, about a month later of the event, in collaboration with the Computational Climate Science Research Team. In this presentation, we will report recent research progress of DA Team mainly focusing on the investigation related to the July 2018 rainfall event.

The 155th R-CCS Cafe-part Ⅲ
Date and Time: Fri. Dec. 7, 2018, 15:15 - 16:15
Place: Lecture Hall (6th floor) at R-CCS

Title: Research Activities for Parallel Programming Models for Current HPC Platforms
Speaker: Jinpil Lee (Architecture Development Team)

Presentation Language: English
Presentation Material: English

Abstract: Detail

In this talk, we introduce two research activities to improve the vectorization and performance optimization for state-of-the-art HPC platforms. Recent trends in processor design accommodate wide vector extensions. SIMD vectorization is more important than before to exploit the potential performance of the target architecture. The latest OpenMP specification provides new directives which help compilers produce better code for SIMD auto-vectorization. However, it is hard to optimize the SIMD code performance in OpenMP since the target SIMD code generation mostly relies on the compiler implementation. In the first part of the talk, we propose a new directive that specifies user-defined SIMD variants of functions used in SIMD loops. The compiler can then use the user-defined SIMD variants when it encounters OpenMP loops instead of auto-vectorized SIMD variants. The user can optimize the SIMD performance by implementing highly-optimized SIMD code with intrinsic functions. The performance evaluation using a image composition kernel shows that the user can optimize SIMD code generation in an explicit way by using our approach. The user-defined function reduces the number of instructions by 70% compared with the auto-vectorized code generated from the serial code. In the latter part of the talk, we propose a programming model for FPGAs. Because of the recent slowdown in silicon technology and increasing power consumption of hardware, several dedicated architectures have been proposed in High Performance Computing (HPC) to exploit the limited number of transistors in a chip with low power consumption. Although Field-Programmable Gate Array (FPGA) is considered as one of the promising solutions to realize dedicated hardware for HPC, it is difficult for non-experts to program FPGAs due to the gap between their applications and hardware-level programming models for FPGAs. To improve the productivity for FPGAs, we propose a C/C++ based programming framework, C2SPD, to describe stream processing on FPGA. C2SPD provides directives to specify code regions to be offloaded onto FPGAs. Two popular performance optimization techniques, vectorization and loop unrolling, also can be described in the directives. The compiler is implemented based on a famous open source compiler infrastructure LLVM. It takes C/C++ code as input and translates it into DSL code for the FPGA backend and CPU binary code. The DSL code is translated into Verilog HDL code by the FPGA backend and passed to the vendor’s FPGA compiler to generate hardware. The CPU binary code includes C2SPD runtime calls to manipulate FPGA, and transfer data between CPU and FPGA. C2SPD assumes a single PCI-card type FPGA device. Data transfer includes communication via the PCI Express interface. The C2SPD compiler uses SPGen, a data-flow High Level Synthesis (HSL) tool, as the FPGA backend. SPGen is an HLS tool for stream processing on FPGAs. The SPGen compiler takes its DSL, Stream Processing Description (SPD) and generates pipelined stream cores on FPGAs. Although the range of application is limited by its domain-specific approach, it can generate highly-pipelined hardware on FPGAs. A 2D-stencil computation kernel is written in C and C2SPD directives and the generated FPGA hardware achieves 175.41 GFLOPS by using 256 stream cores. The performance evaluation shows that vectorization can exploit FPGA memory bandwidth and loop unrolling can generate deep pipeline to hide the instruction latency. By modifying numbers in the directives, the user can easily change the configuration of the generated hardware on the FPGA and optimize the performance.

The 154th R-CCS Cafe
Date and Time: Tue. Nov. 27, 2018, 14:00 - 15:00
Place: Lecture Hall (6th floor) at R-CCS

Title: Performance portable parallel CP-APR tensor decompositions
Speaker: Keita Teranishi (Principal Member of Technical Staff, Sandia National Laboratories, California)

Presentation Language: English
Presentation Material: English

Abstract: Detail

Tensors have found utility in a wide range of applications, such as chemometrics, network traffic analysis, neuroscience, and signal processing. Many of these data science applications have increasingly large amounts of data to process and require high-performance methods to provide a reasonable turnaround time for analysts. Sparse tensor decomposition is a tool that allows analysts to explore a compact representation (low-rank models) of high-dimensional data sets, expose patterns that may not be apparent in the raw data, and extract useful information from the large amount of initial data. In this work, we consider decomposition of sparse count data using CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR).
Unlike the Alternating Least Square (ALS) version, CP-APR algorithm involves non-trivial constraint optimization of nonlinear and nonconvex function, which contributes to the slow adaptation to high performance computing (HPC) systems. The recent studies by Kolda et al. suggest multiple variants of CP-APR algorithms amenable to data and task parallelism together, but their parallel implementation involves several challenges due to the continuing trend toward a wide variety HPC system architecture and its programming models.
To this end, we have implemented a production-quality sparse tensor decomposition code, named SparTen, in C++ using Kokkos as a hardware abstraction layer. By using Kokkos, we have been able to develop a single code base and achieve good performance on each architecture. Additionally, SparTen is templated on several data types that allow for the use of mixed precision to allow the user to tune performance and accuracy for specific applications. In this presentation, we will use SparTen as a case study to document the performance gains, performance/accuracy tradeoffs of mixed precision in this application, development effort, and discuss the level of performance portability achieved. Performance profiling results from each of these architectures will be shared to highlight difficulties of efficiently processing sparse, unstructured data. By combining these results with an analysis of each hardware architecture, we will discuss some insights for improved use of the available cache hierarchy, potential costs/benefits of analyzing the underlying sparsity pattern of the input data as a preprocessing step, critical aspects of these hardware architectures that allow for improved performance in sparse tensor applications, and where remaining performance may still have been left on the table due to having single algorithm implementations on diverging hardware architectures.

The 153rd R-CCS Cafe-part I
Date and Time: Mon. Nov. 26, 2018, 14:10 - 14:40
Place: Lecture Hall (6th floor) at R-CCS

Title: Learning with less labeled data using GANs
Speaker: Foo Chuan Sheng ((A*STAR-I2R Programme Head(Precision Medicine), Scientist(Deep Learning Department))

Presentation Language: English
Presentation Material: English

Abstract: Detail

Deep neural network classifiers typically require large labeled datasets to obtain high predictive performance. Obtaining such datasets can be time and cost prohibitive especially for applications where careful expert labeling is required, for instance, in healthcare and medicine. In this talk, we describe two algorithms using GANs that can help reduce this labeling burden. First, we describe a semi-supervised learning algorithm that utilizes GANs to perform manifold regularization. Our method achieves state-of-the-art performance amongst GAN-based semi-supervised learning methods while being much easier to implement. Second, we describe the Adversarially Learned Anomaly Detection (ALAD) algorithm (based on bi-directional GANs) for unsupervised anomaly detection. ALAD uses reconstruction errors based on adversarially learned features to determine if a data sample is anomalous. ALAD achieves state-of-the-art performance on a range of image and tabular datasets while being several hundred-fold faster at test time than the only published GAN-based method.

The 153rd R-CCS Cafe-part Ⅱ
Date and Time: Mon. Nov. 26, 2018, 14:50 - 15:50
Place: Lecture Hall (6th floor) at R-CCS

Title: Deep Learning 2.0: From algorithms to silicon
Speaker: Vijay Ramaseshan Chandrasekhar (A*STAR-I2R-AI-Group長)

Presentation Language: English
Presentation Material: English

Abstract: Detail

The Deep Learning 2.0 program is a multi-year A*STAR AI program, focused on capturing the next wave of deep learning.
The program is focused on
(a) 10x open problems in deep learning algorithmic research: thrusts include learning with 10x fewer labeled samples, compressing networks by 100x, incorporating knowledge graphs into deep learning, online deep learning, and white-box deep learning.
(b) Next generation hardware for deep learning: we are looking beyond GPUs and TPUs, and reimagining the entire hardware stack for deep learning from algorithms all the way down to silicon.
(c) New emerging enterprise applications for deep learning: ranging from personalized medicine, finance, health-care, IoT and advanced semiconductor manufacturing.
(d) Deep learning on encrypted data: the challenges lying at the intersection of deep learning and homomorphic encryption in making this technology closer to adoption.