OzBLAS
Overview
Accurate and reproducible BLAS routines and CG solvers based on Ozaki scheme (Ozaki et al. 2012) on x86 CPUs and CUDA.
Downloads
- OzBLAS version 1.4 alpha 02 (tgz, 45KB) (May 25, 2021)
- OzBLAS version 1.5 alpha 02 (tgz, 78KB) (Sep 15, 2023)
Publications
- D. Mukunoki, T. Ogita, K. Ozaki: Accurate and Reproducible BLAS Routines with Ozaki Scheme for Many-core Architectures, Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM2019), LNCS, Vol. 12043, pp. 516-527, 2019.
- D. Mukunoki, K. Ozaki, T. Ogita, T. Imamura: DGEMM using Tensor Cores, and Its Accurate and Reproducible Versions, ISC High Performance 2020, Lecture Notes in Computer Science, Vol. 12151, pp. 230-248, 2020.
- D. Mukunoki, K. Ozaki, T. Ogita, R. Iakymchuk: Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme, Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021), pp. 100-109, 2021.