MUBLAS

Overview

MUBLAS (MUBLAS-GEMV) is an optimized implementation of GEMV for NVIDIA GPUs. This implementation automatically adjusts the thread-block size based on the theoretical performance model before launching kernel. (This code is no longer maintained)
NVIDIA GPU向けに最適化されたLevel-1/2 BLASカーネル群です．スレッドブロックサイズをカーネル実行前に理論的な性能モデルに基づいて自動的に調節します．（コードのメンテナンスは終了しています）

Downloads

MUBLAS version 1.5.38 (tgz, 84KB) (November 28, 2016)
If you need older versions, please contact us (daichi.mukunoki [at] riken.jp).

Publications

Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384, Sep. 2016.
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations, GPU Technology Conference (GTC 2015), Mar. 2015.
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650, Mar. 2015.
椋木大地, 今村俊幸, 高橋大介: NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法, 情報処理学会研究報告, Vol. 2015-HPC-150, No. 13, 2015年7月.