ASPEN.K2
Overview
ASPEN.K2 provides GPU/CUDA users with automatic-tuned high performance Level2 BLAS kernels. ASPEN.K2 comes from a shortened name of `Automatic Stabilizing and PEformaNce tuning of level 2 BLAS Kernels.
Downloads
- ASPEN.K2 version 1.9p1 (tar.gz, 11.3MB) (Oct. 25, 2022)
- ASPEN.K2 version 1.6 (tgz, 19.4MB) (Apr. 6, 2017)
- Support the double-double format; wsymv and uhemv.
- Support both static and shared library formats: libaspen.a and libaspen.so.
- Support stream interface, ASPEN_setStream and ASPEN_getStream.
- Unify symv and hemv templates into hesymv template.
- Tune up the kernel by introducing internal fma2 and add2 operations.
- Minor modification for the parametre search scripts.
- Gemv kernels have been obsolated from this version.
- ASPEN.K2 version 1.5p9 (tgz, 16.7MB) (Nov. 25, 2016)
- Support Complex functions; zhemv and chemv.
- Fix a severe bug in the unexpcted overtook store operation onto L2 by atomic operations due to TLB miss or other issues, it was very rarely happened in 1.5p8.
- Minor modification for parametre search scripts.
- ASPEN.K2 version 1.5p8 (tgz, 8.7MB) (Jul. 21, 2016)
- Modify the block-grid shape as a two-dimensional grid.
- Simplify the compilation process which is done for each sm_(20,30,…).
- Modify the Makefile rules to reduce the compliation time.
- Exprimental support of CUDA 8.0 and pascal architectures (sm_(60,61,62)).
- Exprimental support of half precision of Level1 BLAS.
- If you need older versions, please contact us (imamura.toshiyuki [at] riken.jp).
Benchmarks
- SSYMV-L on Tesla P100
- DSYMV-L on Tesla P100
- CHEMV-L on Tesla P100
- ZHEMV-L on Tesla P100
Publications
- 今村俊幸, 椋木大地, 山田進, 町田昌彦 「SYMV・GEMVルーチン群のマルチGPU化とその評価」, 情報処理学会研究報告 「ハイパフォーマンスコンピューティング(HPC)」, Vol. 2015-HPC-151, No. 13, 2015年9月 (in Japanese).
- 今村俊幸, 椋木大地, 山田進, 町田昌彦 「CUDA-xSYMVの実装と評価」, 情報処理学会研究報告 「ハイパフォーマンスコンピューティング(HPC)」, Vol. 2014-HPC-146, No. 14, 2014年10月 (in Japanese).
- T. Imamura, S. Yamada and M. Machida, “A High Performance SYMV Kernel on a Fermi-core GPU”, High Performance Computing for Computational Science – VECPAR 2012, Lecture Note in Computer Science (LNCS) 7851, pp. 59–71, 2013.
- T. Imamura, “ASPEN-K2: Automatic-tuning and Stabilization for the Performance of CUDA BLAS Level 2 Kernels”, 15th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP12), Seattle, USA, Feb. 15-17, 2012.
- T. Imamura, T. Utsumi, X. Lin, S. Yamada, M. Machida, “Performance Tuning for the SYMV kernel on multiple GPU generations, Fermi and Kepler”, IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Vol. 2013-HPC-138,No. 7,pp. 1–7,Feb., 2013, in Japanese.