Workshop on Large-scale Parallel Numerical Computing Technology
(LSPANC 2019 June)
— Technologies and Tools for Reliable, Accurate, and Mixed-Precision Computations —
June 6 – 7, 2019
RIKEN Center for Computational Science (R-CCS), Kobe, Japan
Overview
Numerical computations with floating-point arithmetic suffer from rounding errors, and the computation result may be inaccurate. Besides, on parallel computation, it also causes reproducibility issue. Those issues can be critical on the reliability of the computation result on large-scale computing as well as the development and debug of complex codes. Moreover, to achieve better performance, in terms of both speed and energy efficiency, reduced-precision hardware and mixed-precision approach on such hardware are being used. Therefore, method and tools for addressing the reliability and quality of numerical computations will be more important, toward the Exa-scale computing era. This workshop discusses methods and tools for addressing the reliability and quality of numerical computations on HPC systems.
Information
- Location: RIKEN Center for Computational Science (R-CCS), Kobe, Japan (access map). At the seminar room (on the ground floor in the R-CCS building).
- Registration: Registration free (except for Social Lunch and K Computer Tour), and free of charge (except for Social Lunch).
- Social Lunch (on June 7): Pre-registration is required. If you want to join, please send an email to Daichi Mukunoki (daichi.mukunoki[at]riken.jp) by June 5 (Wed), 2019. The social lunch will be held at a buffet restaurant, “Flower Forest”, in the Kobe Animal Kingdom (a small zoo, next to R-CCS). The entrance fee (the regular price is 1800 JPY) is free at a discount, but please pay 1430 JPY for lunch at the entrance.
- K Computer Tour (on June 7): Pre-registration is required. If you want to join, please send an email to Daichi Mukunoki (daichi.mukunoki[at]riken.jp) by June 5 (Wed), 2019. The K computer will retire this August.
- Wifi access: Eduroam and R-CCS guest wifi are available.
- Note: This workshop does not publish any proceedings. There are no stores within walking distance from the R-CCS building, but there is a small cafeteria & shop in the FOCUS building (connected to the R-CCS building), where you can take lunch and buy some snacks and drinks.
- Contact: Daichi Mukunoki, RIKEN R-CCS (daichi.mukunoki[at]riken.jp)
Program
Day-1: June 6 (Thursday)
10:00–10:10 | Opening |
10:10–10:50 | Talk: “High Precision Floating and Integer Arithmetic on Supercomputing Environment” (Toshiyuki Imamura, RIKEN R-CCS) |
Abstract: One of our team missions is to develop a high-precision numerical solver performing on various parallel supercomputing environment such K computer at RIKEN R-CCS, OFP at U-Tokyo, Tsubame at TiTech, Cygnus at Tsukuba University. Our slogan is based on anywhere anytime HPC (High-Performance Computing). It is natural that our backbone includes computer science, applied mathematics, and other related engineerings. Since the establishment of the team, we have developed several high-performance and high-precision software. EigenQP and related BLAS and a subset of LAPACK are very powerful tools if users want to confirm the eigenvalue problems and general linear algebra calculation with an extended data format by double-double format (DD). We naturally extend the idea of DD, where we shift and assemble multiple floating numbers, to multiple precision integer/fixed-point number formats. EigenG, ASPEN.K2 and MUBLAS on CUDA GPUs are typical examples. We took advanced features of a modern programming language such as template, abstract class object, and inheritance/polymorphism. Of course, we put much efforts to optimize and improve performance. In the talk, some of high-precision and high-performance numerical libraries are demonstrated with some benchmark results on several supercomputers and high-end GPU systems. | |
10:50–11:00 | Break |
11:00–11:50 | Invited talk: “High-Precision Anchored Accumulators for Reproducible Floating-Point Summation” (Neil Burgess, ARM) |
Abstract: This paper introduces a new datatype, the High-Precision Anchored (HPA) number, that allows reproducible accumulation of floating-point (FP) numbers in a programmer-selectable range. The new datatype has a larger significand and a smaller range than existing FP formats and has much better arithmetic and computational properties. In particular, it is associative, parallelizable and reproducible. The paper also describes how HPA processing can be implemented as part of Arm’s new Scalable Vector Extension (SVE) together with proposals for new instructions aimed specifically at the new datatype. For the modest ranges that will accommodate most problems, HPA processing is much faster than FP arithmetic: performance modelling shows 2-lane HPA accumulation of FP64 operands is 9.5 times faster on Arm’s new vector architecture than double double accumulation and accelerates a recently published software algorithm for 3-lane reproducible FP summation by a factor of 5.6. | |
11:50–13:00 | Lunch Break |
13:00–13:50 | Invited talk: “INTLAB – The Matlab/Octave Toolbox for Reliable Computing” (Siegfried M. Rump, Hamburg University of Technology and Waseda University Tokyo) |
Abstract: The result of a numerical algorithm is usually a good approximation to the true result, even for difficult problems. However, sometimes results may be incorrect, even completely wrong, and sometimes without warning. In contrast, the result of a so-called verification algorithm is always mathematically correct. That includes all sources of errors, in particular rounding errors due to the limited precision of floating-point arithmetic.. In this talk the principles of verification methods will be discussed using INTLAB, the Matlab/Octave toolbox for Reliable Computing. The toolbox has several thousand users in more that 50 countries. | |
13:50–14:00 | Break |
14:00–14:50 | Invited talk: “MPLAPACK: Multiple Precision Version of BLAS and LAPACK” (Maho Nakata, RIKEN ACCC) |
Abstract: MPLAPACK is a multiple precision version of BLAS and LAPACK. Now development is held at https://github.com/nakatamaho/mplapack. It supports five popular multiple precision libraries; MPFR, GMP, QD, binary128, and DD via C++ class. We show how to use MPLAPACK with a demonstration. | |
14:50–15:00 | Break |
15:00–16:00 | Invited talk: “Principles of Discrete Stochastic Arithmetic (DSA) – The CADNA & PROMISE Tools (Part-1)” (Fabienne Jézéquel, Sorbonne University) |
Abstract: Discrete Stochastic Arithmetic (DSA) is an automatic method for rounding error analysis based on a probabilistic approach. DSA allows to estimate the number of exact significant digits in computed results by executing the user programs several times in a synchronous way using a random rounding mode. We present the CADNA library (http://cadna.lip6.fr) an implementation of DSA that controls the numerical quality of sequential or parallel programs and detects numerical instabilities generated during their execution. A particular version of CADNA which enables numerical validation in hybrid CPU-GPU environments is described. Finally we present PROMISE (PRecision OptiMISE, http://promise.lip6.fr), a tool for precision auto-tuning. Most numerical simulations are performed in double precision (IEEE754 binary64), and this can be costly in terms of computing time, memory transfer and energy consumption. The PROMISE tool, based on CADNA, aims at reducing in numerical programs the number of double precision variable declarations in favor of single precision ones, taking into account a requested accuracy of the results. | |
16:00–16:10 | Break |
16:10–17:10 | Invited talk: “Principles of Discrete Stochastic Arithmetic (DSA) – The CADNA & PROMISE Tools (Part-2)” (Fabienne Jézéquel, Sorbonne University) |
Abstract: The continuation of the part-1. | |
17:10–17:20 | Announcement |
Day-2: June 7 (Friday)
10:00–10:50 | Invited talk: “Accurate and Validated Numerical Computing” (Fabienne Jézéquel, Sorbonne University) |
Abstract: To improve the numerical quality of results, one can increase the working precision. In addition to the widely used binary32 and binary64 formats, the IEEE 754-2008 standard defines the binary128 format, also called quadruple precision. Moreover arbitrary precision libraries, such as ARPREC and MPFR, exist. If a simple enough computation is performed, its accuracy can be improved thanks to compensated algorithms. These algorithms are based on error-free transformations (EFTs) that make it possible to compute the rounding errors of some elementary operations like addition and multiplication exactly. Interval arithmetic and Discrete Stochastic Arithmetic (DSA) both enable one to control the validity of numerical results. Both methods are based on directed rounding, either to provide guaranteed interval bounds, or to estimate rounding errors thanks to a random rounding mode. In this talk we first describe the behaviour with directed rounding of compensated algorithms based on EFTs that are intended to be used with rounding to nearest. We show how to compute tight interval inclusions with compensated algorithms. Then we show that DSA can be used to estimate the numerical quality of results computed using compensated algorithms. We consider compensated algorithms for summation, dot product, and polynomial evaluation with Horner scheme. In this talk we also show that the validity of numerical results computed in quadruple or in arbitrary precision can be controlled thanks to DSA. This control can be performed thanks to an extension of the CADNA library (http://cadna.lip6.fr) for quadruple precision programs and thanks to the SAM library (Stochastic Arithmetic in Multiprecision, http://www-pequan.lip6.fr/~jezequel/SAM) for arbitrary precision programs. We present results obtained with DSA in quadruple or in arbitrary precision for various applications, such as a chaotic sequences, or the computation of multiple roots of polynomials. | |
10:50–11:00 | Break |
11:00–11:50 | Invited talk: “Review of Error-Free Transformation of Matrix Multiplication, Basics and Applications” (Katsuhisa Ozaki, Shibaura Institute of Technology) |
Abstract: We introduce an error-free transformation of matrix multiplication. This is very useful for developing accurate numerical algorithms for matrix multiplication. In addition, since we can directly use optimized BLAS routines for the transformation, the performance is very high. First, we review key technique of the error-free transformation. Next, we introduce the applications. Recently, the error-free transformation is applied for developing reproducible numerical algorithms. In addition, the transformation is used for generating test problems with specified solution in numerical linear algebra, for example, a system of linear equations and eigenvalue problems. We introduce numerical examples for the applications. | |
11:50–14:00 | Social Lunch (at Kobe Animal Kingdom, optional) |
Note: Pre-registration is required. If you want to join, please send an email to Daichi Mukunoki (daichi.mukunoki[at]riken.jp) by May 27 (Mon), 2019. It will be held at a buffet restaurant, “Flower Forest”, in the Kobe Animal Kingdom (a small zoo, next to R-CCS). The entrance fee (the regular price is 1800 JPY) is free at a discount, but please pay 1430 JPY for lunch at the entrance. We will depart from the conference venue at 11:50. If you don’t enter with us together, you cannot receive the discount for the entrance fee. | |
14:00–15:00 | K Computer Tour (optional) |
Note: Pre-registration is required. If you want to join, please send an email to Daichi Mukunoki (daichi.mukunoki[at]riken.jp) by May 27 (Mon), 2019. We will depart from the conference venue at 14:00. | |
15:00–15:10 | Break |
15:10–15:40 | Talk: “High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs” (Daichi Mukunoki, RIKEN R-CCS) |
Abstract: This talk introduces implementations and performances of accurate and reproducible BLAS routines on GPUs and discusses challenges and issues for supporting accurate and reproducible computations on state-of-the-art architectures from the view point of high-performance computing. | |
15:40–16:10 | Invited Talk: “Preconditioned Cholesky QR Algorithms for Ill-conditioned Matrices” (Takeshi Terao, Shibaura Institute of Technology) |
Abstract: Cholesky QR algorithms, such as CholeskyQR and CholeskyQR2, are ideally employed for thin QR decomposition due to their communication avoidance for tall-skinny matrices. On the other hand, Cholesky QR algorithms are not applicable for ill-conditioned matrices. If the condition number of matrices represented by bibary64 in IEEE 754 is over 10^8, Cholesky QR algorithms break down. The aim of our study is to develop preconditioning methods for Cholesky QR algorithms. Using Cholesky factors for the preconditioning, Cholesky QR algorithms can be applied up to the condition number 10^16. We will show the efficiency and robustness of the proposed algorithms in parallel computing by numerical examples. | |
16:10–16:40 | Talk: “The Effect of the Higher Precision on the IC Preconditioner” (Masatoshi Kawai, RIKEN R-CCS) |
Abstract: IC preconditioner is widely used for solving SLEs. On some applications as eigenvalue problems, higher accuracy and robustness is required. In this talk, we will discuss the effect of a higher precision on the IC preconditioner. | |
16:40–17:00 | Summary & closing |
Acknowledgement
This workshop was supported by FOCUS Establishing Supercomputing Center of Excellence.