On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and, therefore, non-reproducible mainly due to the non-associativity of floating-point operations and the dynamic scheduling. We address the problem of reproducibility in the context of fundamental linear algebra operations -- like the ones included in the Basic Linear Algebra Subprograms (BLAS) library -- and propose algorithms that yields both reproducible and accurate results. We extend this approach to the higher level linear algebra algorithms, e.g. the LU factorization, that are built on top of these BLAS kernels. We present these reproducible and accurate algorithms for the BLAS routines and the LU factorization as well as their implementations in parallel environments such as Intel server CPUs, Intel Xeon Phi, and both NVIDIA and AMD GPUs. We show that the performance of our implementations is comparable to the standard ones.
日時: 2018年5月31日(木)、13:00 – 14:00
（14:00 – 14:30 Cafe Time）
場所: R-CCS 6階講堂
・講演題目：Reproducibility and accuracy of BLAS routines and their application
・講演者： Roman Iakymchuk（Postdoctoral researcher, KTH, Sweden）