TOP > Publications

Publications | 研究業績

2025 (updating)

Journal Articles

Yuki Uchino, Toshiyuki Imamura, "Parallel Tall-and-Skinny QR Factorization Based on LU-CholeskyQR Algorithm", 2025 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-10, 2025, doi.org/10.1109/CLUSTER59342.2025.11186492.
中嶋陸, 内野佑基, 成見哲, 今村俊幸, "INT8による倍精度複素行列積エミュレーションの高速化と混合精度入出力に対応したライブラリの実装", 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2025-HPC-200, No. 27, pp. 1-10, July 28, 2025, https://ipsj.ixsq.nii.ac.jp/records/2003164, (in Japanese).

Conference Proceedings

Yuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura, "High-Performance and Power-Efficient Emulation of Matrix Multiplication using INT8 Matrix Engines", Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, St. louis, MO, USA, 16-21, Nov. 2025, pp. 1824--1831, doi.org/10.1145/3731599.3767539.

Preprint

Yuki Uchino, Qianxiang Ma, Toshiyuki Imamura, Katsuhisa Ozaki, Patrick Lars Gutsche, "Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem", preprint, 2025, doi.org/10.48550/arXiv.2512.08321.
Yuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura, "High-Performance and Power-Efficient Emulation of Matrix Multiplication using INT8 Matrix Engines", preprint, 2025, doi.org/10.48550/arXiv.2508.03984.
Theresa Pollinger, Masado Ishii, Jens Domke, "The Beauty of Anisotropic Mesh Refinement: Omnitrees for Efficient Dyadic Discretizations", preprint, 2025, doi.org/10.48550/arXiv.2508.06316.
Katsuhisa Ozaki, Yuki Uchino, Toshiyuki Imamura, "Ozaki Scheme II: A GEMM-oriented emulation of floating-point matrix multiplication using an integer modular technique", preprint, 2025, doi.org/10.48550/arXiv.2504.08009.

2024

Journal Articles

Yuki Uchino, Katsuhisa Ozaki, Takeshi Terao, Toshiyuki Imamura, "Fast Generation of Real-Symmetric Matrices and their Exact Eigenpairs", Journal of Advanced Simulation in Science and Engineering, vol. 12, pp.44--60, 2025, doi.org/10.15748/jasse.12.44.
Jorge Moron-Vidal, Francisco Bernal, Atsushi Suzuki, "An explicit substructuring method for overlapping domain decomposition based on stochastic calculus", Applied Numerical Mathematics, vol. 208, pp.340-355, 2025, doi.org/10.1016/j.apnum.2024.02.011.
Yuki Uchino, Takeshi Terao, Katsuhisa Ozaki, "Acceleration of iterative refinement for singular value decomposition", Numer. Algorithm, vol. 95, pp. 979--1009, 2024
Piotr Luszczek, Ahmad Abdelfattah, Hartwig Anzt, Atsushi Suzuki, Stanimire Tomov, "Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications", Future Generation Computer Systems, 160, pp.359-374, 2024, doi.org/10.1016/j.future.2024.06.004.
Yuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura, "Performance enhancement of the Ozaki Scheme on integer matrix multiplication unit", The International Journal of High Performance Computing Applications, vol. 39, no. 3, pp. 462--476, 2025, doi.org/10.1177/10943420241313064.

Conference Proceedings

Yuki Uchino, Toshiyuki Imamura, "High-Performance Eigensolver Combining EigenExa and Iterative Refinement", SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 17-22, Nov. 2024, pp. 1703--1712, doi.org/10.1109/SCW63240.2024.00213.

2023

Conference Proceedings

Kumar Saurabh, Masado Ishii, Makrand A Khanwale, Hari Sundar, Baskar Ganapathysubramanian. "Scalable adaptive algorithms for next-generation multiphase simulations". IPDPS 2023: IEEE International Parallel and Distributed Processing Symposium, St. Petersburg, FL, USA, 2023, pp. 590-601, doi.org/10.1109/IPDPS54959.2023.00065.

2022

Journal Articles

Yuki Uchino, Katsuhisa Ozaki, Takeshi Ogita, "Acceleration of iterative refinement for symmetric eigenvalue decomposition (in Japanese)", IPSJ Trans. ACS, vol. 15, no. 1, pp. 1--12, 2022.

Conference Proceedings

Takuya Ina, Yasuhiro Idomura, Toshiyuki Imamura, Naoyuki Onodera. “A new data conversion method for mixed precision Krylov solvers with FP16/BF16 Jacobi preconditioners,” Proc. the International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 23). pp. 29-–34, Singapore, Feburary 2023. doi.org/10.1145/3578178.3578222.
Yuta Hasegawa, Toshiyuki Imamura, Takuya Ina, Naoyuki Onodera, Yuuichi Asahi, and Yasuhiro Ido- mura, “GPU Optimization of Lattice Boltzmann Method with Local Ensemble Transform Kalman Fil- ter,” Proc. IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Het- erogeneous Systems (ScalAH 2022), pp.10–17, Dallas, November 2022, published on 30 January 2023, doi:10.1109/ScalAH56622.2022.00007.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, “Infinite-precision Inner Product and Sparse Matrix Vector Multiplication using Ozaki Scheme with Dot2 on Many-core Processors,” Proc. 14th International Conference on Parallel Processing and Applied Mathematics (PPAM 2022), 2022 (to appear)
Atsushi Suzuki, “A factorization algorithm for sparse matrix with mixed precision arithmetic,” ECCOMAS Congress 2022 - 8th European Congress on Computational Methods in Applied Sciences and Engineering, 2022, 12 pages, doi.org/10.23967/eccomas.2022.006.
Susumu Yamada, Toshiyuki Imamura, Masahiko Machida, “High Performance Parallel LOBPCG Method for Large Hamiltonian Derived from Hubbard Model on Multi-GPU Systems,” Proc. Supercomputing Frontiers: 7th Asian Conference (SCFA 2022), pp. 1–19, Singapore, March 1–3, 2022, doi.org/10.1007/978-3-031-10419-0_1.
Ryo Yoda, Matthias Bolten, Kengo Nakajima, and Akihiro Fujii. “Assignment of idle processors to spatial redistributed domains on coarse levels in multigrid reduction in time,” Proc. International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022). pp. 41–51, 2022, doi.org/10.1145/3492805.3492810.
Masatoshi Kawai, and Kengo Nakajima. “Low/Adaptive Precision Computation in Preconditioned Iter- ative Solvers for Ill-Conditioned Problems,” Proc. International Conference on High Performance Com- puting in Asia-Pacific Region (HPC Asia 2022). pp. 30–40, 2022, doi.org/10.1145/3492805.349281.3.
Takashi Arakawa, Hisashi Yashiro, and Kengo Nakajima. “Development of a coupler h3-Open-UTIL/MP,” Proc. International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022). pp. 72–83, 2022, doi.org/10.1145/3492805.3492809.
Shinji Sumimoto, Toshihiro Hanawa, and Kengo Nakajima, “A Process Management Runtime with Dy- namic Reconfiguration,”” Proc. IXPUG Workshop in conjunction with International Conference on High Performance Computing in Asia-Pacific Region Workshops (HPC Asia 2022 Workshop). pp. 10–18, 2022, doi.org/10.1145/3503470.3503473.
Masashi Horikoshi, Balazs Gerofi, Yutaka Ishikawa, and Kengo Nakajima. “Exploring Communication- Computation Overlap in Parallel Iterative Solvers on Manycore CPUs using Asynchronous Progress Con- trol.” Proc. IXPUG Workshop in conjunction with International Conference on High Performance Computing in Asia-Pacific Region Workshops (HPC Asia 2022 Workshop), pp. 29–39, 2022, doi.org/10.1145/3503470.3503474.
Kengo Nakajima, Balazs Gerofi, Masashi Horikoshi, and Yutaka Ishikawa, “Communication-Computation Overlapping for Preconditioned Parallel Iterative Solvers with Dynamic Loop Scheduling.” Proc. IWAH- PCE Workshop in conjunction with International Conference on High Performance Computing in Asia- Pacific Region Workshops (HPC Asia 2022 Workshop), pp. 60–71, 2022, doi.org/10.1145/3503470.3503477.

2021

Journal Articles

Kumar Saurabh†, Masado Ishii†, Milinda Fernando, Boshun Gao, Kendrick Tan, Ming-Chen Hsu, Adarsh Krishnamurthy, Hari Sundar, Baskar Ganapathysubramanian, "Scalable adaptive PDE solvers in arbitrary domains", SC'21: International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, USA, 2021, pp. 01-18.
Yiyu Tan, Toshiyuki Imamura, and Masaaki Kondo, “FPGA-Based Acceleration of FDTD Sound Field Rendering,” Journal of the Audio Engineering Society, Vol.69, Issue 7/8, pp. 542–556, July 2021, doi.org/10.17743/jaes.2021.0025.
Shuhei Kudo, Yusaku Yamamoto, and Toshiyuki Imamura, “Error Analysis of the Cholesky QR-Based Block Orthogonalization Process for the One-Sided Block Jacobi SVD Algorithm,” Comput. Informatics 39(6) 1203—1228. doi.org/10.31577/cai_2020_6_1203 (2020) published 2021-05-20.

Conference Proceedings

Takeshi Terao, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, “Verified Numerical Computations for a Standard Eigenvalue Problem Without Directed Rounding,” Proc. the 40th JSST Annual Interna- tional Conference on Simulation Technology, 2021.
Takuya Ina, Yasuhiro Idomura, Toshiyuki Imamura, Susumu Yamashita, and Naoyuki Onodera, “Iterative methods with mixed-precision preconditioning for ill-conditioned linear systems in multiphase CFD simu- lations,” Proc. 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 21), November 19, 2021, St. Louis, MO, USA, doi.org/10.1109/ScalA54577.2021.00006.
Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey C. Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shi- rahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda, Toshiyuki Imamura, Akihiko Kasagi, Kentaro Kawakami, Shuhei Kudo, Akiyoshi Kuroda, Maxime Martinasso, Satoshi Mat- suoka, Henrique Mendon ̧ca, Kazuki Minami, Prabhat Ram, Takashi Sawada, Mallikarjun Shankar, Tom St. John, Akihiro Tabuchi, Venkatram Vishwanath, Mohamed Wahib, Masafumi Yamazaki, Junqi Yin, “MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems,” 2021, IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), CoRR abs/2110.11466 (2021), doi.org/10.1109/MLHPC54614.2021.00009.
Kengo Nakajima, Takeshi Ogita and Masatoshi Kawai, “Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision Computing,” Proc. 16th International Workshop on Automatic Performance Tuning (iWAPT 2021) in conjunction with IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2021), pp. 760–769, 2021, doi.org/10.1109/IPDPSW52791.2021.00114.
Yen-Chen Chen, and Kengo Nakajima, “Optimized Cascadic Multigrid Parareal Method for Explicit Time-Marching Schemes,” Proc. 12th Workshop on Latest Advances in Scalable Algorithms for Large- Scale Systems (ScalA 2021), in conjunction with SC21, pp. 9–18, 2021, doi.org/10.1109/ScalA54577.2021.00007.
Daichi Mukunoki, Yusuke Hirota, and Toshiyuki Imamura, “Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs,” Proc. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2021), pp. 234–241, 2021, https://doi.org/10.1109/MCSoC51149.2021.00042.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, and Toshiyuki Imamura, “Accurate Matrix Multipli- cation on Binary128 Format Accelerated by Ozaki Scheme,” Proc. The 50th International Conference on Parallel Processing (ICPP 2021), No. 78, pp. 1–11, Aug. 9, 2021, doi.org/10.1145/3472456.3472493.
Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi, “A rapid Euclidean norm calculation algorithm that reduces overflow and underflow,” Proc. The 2021 Inter- national Conference on Computational Science and Its Applications (ICCSA 2021), Lecture Notes in Com- puter Science, Vol. 12949, pp. 95–110, Sep. 9, 2021, doi.org/10.1007/978-3-030-86653-2_7.
Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, and Satoshi Matsuoka, “Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?” Proc. 35th IEEE Inter- national Parallel & Distributed Processing Symposium (IPDPS 2021), pp. 1056–1065, Jun. 28, 2021, doi.org/10.1109/IPDPS49936.2021.00114.
Katsuhisa Ozaki, Takeshi Ogita, and Daichi Mukunoki, “Interval Matrix Multiplication using Fast Low- Precision Arithmetic on GPU,” Proc. 9th International Workshop on Reliable Engineering Computing (REC 2021), pp. 419–434, May 2021, http://ww2new.unime.it/REC2021/papers/REC2021-18.pdf
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, and Roman Iakymchuk, “Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme,” Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2021), pp. 100–109, 2021, doi/10.1145/3432261.3432270

2020

Conference Proceedings

Yutaka Ishikawa, Mitsuhisa Sato, Toshiyuki Imamura, Yuetsu Kodama, Shuhei Kudo, Keigo Nitadori, Takuya Ina, Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Toshiyuki Shimizu, Ikuo Miyoshi, Hideki Miwa, and Satoshi Hosoi, “Feat of Winning Four Major Benchmarks on Supercomputer Fugaku,” the Journal of the Institute of Electronics, Information and Communication Engineers (IEICE), Vol. 103, No. 12, pp. 1217–1220, 2020 (in Japanese, 石川裕, 佐藤三久, 今村俊幸, 児玉祐悦, 工藤周平, 似鳥啓吾, 伊奈拓也,中尾昌広,上野晃司,藤澤克樹,清水俊幸,三吉郁夫,三輪英樹,細井聡: 「スーパーコンピュータ「富岳」4冠達成」, 電子情報通信学会誌 ), https://www.journal.ieice.org/summary.php? id=k103_12_1217&year=2020&lang=J.
Hisashi Yashiro, Koji Terasaki, Yuta Kawai, Shuhei Kudo, Takemasa Miyoshi, Toshiyuki Imamura, Kazuo Minami, Hikaru Inoue, Tatsuo Nishiki, Takayuki Saji, Masaki Satoh, and Hirofumi Tomita, “A 1024- member ensemble data assimilation with 3.5-km mesh global weather simulations,” Pro. the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 20), Article No. 1, pp. 1–10, November 2020, https://doi.org/10.1109/SC41405.2020.00005, (Gordon Bell Finalist paper).
Yasuhiro Idomura, Takuya Ina, Yussuf Ali, and Toshiyuki Imamura, “Acceleration of fusion plasma turbu- lence simulations using mixed-precision communication-avoiding Krylov method,” Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 20), Article No. 93, pp. 1—13, November 2020, https://doi.org/10.1109/SC41405.2020.00097.
Shuhei Kudo, Keigo Nitadori, Takuya Ina, and Toshiyuki Imamura, “Implementation and Numerical techniques for One Eflop/s HPL-AI benchmark on Fugaku,” Proc. IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2020), pp. 69-–76, 2020, https://doi. org/10.1109/ScalA51936.2020.00014.
Fabienne Jezequel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, and Roman Iakymchuk, “Can we avoid rounding-error estimation in HPC codes and still get trustful results?” Proc. 13th International Workshop on Numerical Software Verification 2020 (NSV 20), Lecture Notes in Computer Science, Vol. 12549, pp. 163–177, Dec. 2020, http://dx.doi.org/10.1007/978-3-030-63618-0_10.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, DGEMM using Tensor Cores, and Its Accurate and Reproducible Versions, ISC High Performance 2020, Lecture Notes in Computer Science, Vol. 12151, pp. 230-248, June 15, 2020, doi.org/10.1007/978-3-030-50743-5_12.
Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk, Can we avoid rounding-error estimation in HPC codes and still get trustful results?, 13th International Workshop on Numerical Software Verification 2020 (NSV 20), June 1, 2020, https://hal.archives-ouvertes.fr/hal-02486753/.
Yussuf Ali, Naoyuki Onodera, Yasuhiro Idomura, Takuya Ina, Toshiyuki Imamura, GPU Acceleration of Communication Avoiding Chebyshev Basis Conjugate Gradient Solver for Multiphase CFD Simulations, 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Page 1-8, January 6, 2020, doi.org/10.1109/ScalA49573.2019.00006.
Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki, Design of an FPGA-based Matrix Multiplier with Task Parallelism, Proc. International Conference on Parallel Computing (ParCo2019), Parallel Computing: Technology Trends, Vol. 36, pp. 241-250, January 1, 2020, doi.org/10.3233/APC200047.

Poster Presentations

Tan Yiyu, Toshiyuki Imamura,Daichi Mukunoki, An FPGA-based Matrix Multiplier with Task Parallelism, Tthe 2nd R-CCS International Symposium, Kobe, Japan, February 17, 2020.
Kento Sato, Akiyoshi Kuroda, Kazuo Minami, Jens Domke, Aleksandr Drozd, Mohamed Wahib, Shuhei Kudo, Toshiyuki Imamura, Kiyoshi Kumahata, Keigo Nitadori, Kazuo Ando, Satoshi Matsuoka, DL4Fugaku: Deep learning for Fugaku - Scalability Performance Extrapolation –, The 2nd R-CCS international symposium, Kobe, Japan, February 11, 2020.
Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, The 2nd R-CCS international symposium, Kobe, Japan, February 11, 2020.
Yasuhiro Idomura, Takuya Ina, Yussuf Ali, Toshiyuki Imamura, Optimization of Fusion Plasma Turbulence Code GT5D on FUGAKU and SUMMIT, The 2nd R-CCS international symposium, Kobe, Japan, February 11, 2020.
Toshiyuki Imamura, Yusuke Hirota, Takuya Ina, Re-design of parallel divide and conquer algorithm for a symmetric band matrix, The 2nd R-CCS international symposium, Kobe, Japan, February 11, 2020.
Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Accurate DGEMM using Tensor Cores, HPC Asia 2020, Fukuoka, Japan, January 16, 2020.
Roman Iakymchuk, Fabienne Jezequel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano，Norihisa Fujita, Taisuke Boku, Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations, International Conference on High Performance Computing in Asia-pacific Region (HPC Asia), Fukuoka, Japan, January 16, 2020.

Oral Presentations

Toshiyuki Imamura, Tan Yiyu, Precision Tuning of the Arithmetic Units in Matrix Multiplication on FPGA, SIAM Conference on Parallel Processing for Scientific Computing (PP20, Seattle, U.S., February 16, 2020.
Toshiyuki Imamura, Overview of minimal-precision computing and (weak)-numerical reproducibility, Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January), R-CCS, Kobe, Japan, January 29, 2020, -30.
Tan Yiyu, Precision Tuning of the Arithmetic Units in Matrix Multiplication on FPGA, Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 ), Kobe, Japan, January 29, 2020.
Daichi Mukunoki, Takeshi Ogita, High-performance Implementations of Accurate Linear Algebra Kernels on GPUs, 3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST-2020), Ghaziabad, India, January 10, 2020.

Conference Proceedings

椋木大地, 荻田武史, 尾崎克久, 今村俊幸, 尾崎スキームによる高精度かつ再現性のあるBLAS実装, 日本応用数理学会2019年年会講演予稿集, pp. 402-403, September 3, 2019 (in Japanese).
工藤周平, ヤコビ回転カーネルを用いたヤコビ固有値計算手法の性能評価, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2019-HPC-190, No. 35, pp. 1-8, July 17, 2019, http://id.nii.ac.jp/1001/00198082/, (in Japanese).
工藤周平, 今村俊幸, 高い演算密度をもつヤコビ回転カーネルの構成手法, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2019-HPC-168, No. 20, pp. 1-9, February 26, 2019, http://id.nii.ac.jp/1001/00194702/, (in Japanese).
Shuhei Kudo, Toshiyuki Imamura, Cache-efficient implementation and batching of tridiagonalization on manycore CPUs, HPC Asia 2019, pp. 71-80, January 15, 2019, http://doi.org/10.1145/3293320.3293329.

Poster Presentations

Tan Yiyu, Toshiyuki Imamura, High-order FDTD Method for Room Acoustic Simulation, the 40th Symposium on Ultrasonic Electronics (USE 2019), Tokyo, Japan, November 26, 2019.
Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku, Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), Denver, USA, November 19, 2019, https://sc19.supercomputing.org/proceedings/tech_poster/tech_poster_pages/rpost206.html .
Tan Yiyu, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku, Reduced and Extended-Precision Computations on FPGAs and GPUs, The 11th symposium on Discovery, Fusion, Creation of New Knowledge by Multidisciplinary Computational Sciences, Tsukuba, Japan, October 15, 2019.
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Accurate and Reproducible Linear Algebra Operations for Many-core Architectures, Russian Supercomputing Days 2019 (RuSCDays 2019), Moscow, Russia, September 23, 2019.
Tan Yiyu, Toshiyuki Imamura, A FPGA-based Accelerator for Sound Field Rendering, the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 4, 2019.
Tan Yiyu, Toshiyuki Imamura, Design of an FPGA-based Matrix Multiplier with Task Parallelism, ISC High Performance 2019, June 16-20, Frankfurt, Germany, June 18, 2019.
Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu, Development of Scientific Numerical Libraries on post-K computer, The 1st R-CCS International Symposium, Kobe Japan, February 18, 2019.

Oral Presentations

今村俊幸, エクサ時代の非同期タスクを応用した高性能高次元数値線形代数の研究, 第11回自動チューニング技術の現状と応用に関するシンポジウム(ATTA2019), 東京大学弥生講堂一条ホール, December 23, 2019 (in Japanese).
Toshiyuki Imamura, Numerical Reproducibility based on Minimal-Precision Validation, Computational Reproducibility at Exascale Workshop (CRE2019), Denver, USA, November 17, 2019, http://www.cs.fsu.edu/~cre/cre-2019/papers/SC19-CRE2019_paper_3.pdf.
Toshiyuki Imamura, Daichi Mukunoki, Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Numerical software on Fugaku, Joint US-Japan Workshop on PostK-ECP Collaboration and JIFT Exascale Computing Collaboration, Kobe, Japan, October 29, 2019.
工藤周平, 今村俊幸, A level-3 BLAS like kernel of the Jacobi rotations for the Jacobi’s eigenvalue algorithm, The 13th workshop in the series of the "Parallel Numerics" workshops (ParNum 2019), Dubrovnik, Croatia, October 28, 2019.
Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Accurate and Reproducible CG Method on GPUs, European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019), Egmond aan Zee, Netherlands, October 1, 2019.
今村俊幸, 「京」ならびに「富岳」に向けた並列固有値計算ソルバについて, 第4回 High Performance Computing Physics (HPC-Phys) 勉強会, 神戸, August 26, 2019 (in Japanese).
Daichi Mukunoki, High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs, Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June), Kobe, Japan, June 7, 2019.
Toshiyuki Imamura, High Precision Floating and Integer Arithmetic on Supercomputing Environment, Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June), Kobe, Japan, June 6-7, 2019.
工藤周平, 今村俊幸, オンデマンドな行列計算カーネル生成機構の構想, 第24回計算工学講演会, 大宮市, May 30, 2019 (in Japanese).
椋木大地, 尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用, 第22回AT研究会オープンアカデミックセッション（ATOS22）, 東京, May 13, 2019 (in Japanese).
Toshiyuki Imamura, Research advancement in autotuning in libraries and applications, 9th JLESC meeting at ICL, Knoxville, TN, US, April 15-17, 2019.
Toshiyuki Imamura, Inge Gutheil, Review on standard eigensolvers on a high-end GPU system, 9th JLESC meeting at ICL, Knoxville, TN, US, April 15-17, 2019.
Toshiyuki Imamura, High Performance Eigensolver Exploiting an Online Tuning Mechanism, SIAM Conference on Computational Science and Engineering (CSE19), Spokane, WA, USA, March 1, 2019.
Tan Yiyu, Toshiyuki Imamura, Design and Implementation of an OpenCL based Matrix Multiplier, 2019 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing , Kaohsiung, Taiwan, February 15, 2019.
Toshiyuki Imamura, Development of a Dense Eigenvalue solver for Exa-scale Systems, International Workshop on Massively Parallel Programming for Quantum Chemistory and Physics 2019, Kobe, Japan, January 15, 2019.

2018

Journal Articles

Tan Yiyu, A Hardware-oriented Object Model for Java in an Embedded Processor, Microprocessors and Microsystems, Vol. 63, pp. 85-97, November 1, 2018, http://doi.org/10.1016/j.micpro.2018.08.007.

Conference Proceedings

Haruka Yamada, Akira Imakura, Toshiyuki Imamura, Tetsuya Sakurai, Optimization of reordering procedures in HOTRG for distributed parallel computing, 2018 IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPS Workshops 2018, pp. 957-966, August 8, 2018, http://doi.org/10.1109/IPDPSW.2018.00150.
工藤周平, 今村俊幸, 三重対角化におけるメニーコア環境に適した同期手法, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2018-HPC-165, No. 33, pp. 1-8, July 23, 2018, http://id.nii.ac.jp/1001/00190586/ (in Japanese).
Daichi Mukunoki, Toshiyuki Imamura, Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster, Proc. International Conference on Computational Science (ICCS2018), Lecture Notes in Computer Science, Vol. 10862, pp. 853-858, June 12, 2018, https://doi.org/10.1007/978-3-319-93713-7_85.
Takeshi Fukaya, Toshiyuki Imamura, Yusaku Yamamoto, A case study on modeling the performance of dense matrix computation: Tridiagonalization in the EigenExa eigensolver on the K computer, Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1113-1122, May 25, 2018, http://doi.org/10.1109/IPDPSW.2018.00171.
Daichi Mukunoki, Toshiyuki Imamura, Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer, Proc. 12th International Conference on Parallel Processing and Applied Mathematics (PPAM2017), Lecture Notes in Computer Science, Vol. 10777, pp. 348-358, March 2018, https://doi.org/10.1007/978-3-319-78024-5_31.
青木聖陽, 今村俊幸, 横川三津夫, 廣田悠輔, メニーコアプロセッサにおける多軸分割を用いた3次元FFTの性能評価, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2018-HPC-163, No.29, pp. 1-7, February 21, 2018, http://id.nii.ac.jp/1001/00185970/ (in Japanese).

Poster Presentations

山田悠加, 今倉暁, 今村俊幸, 櫻井鉄也, 計算手順と配列の並び替え手順の最適化によるn次元HOTRGの計算時間の削減, 日本応用数理学会2018年会, 名古屋大学東山キャンパス, September 4, 2018, (in Japanese).
Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura, Automatic Generation of Full-set Batched BLAS, Internaional Supercomputing Conference(ISC18), Frankfurt, Germany, June 26, 2018.
Tan Yiyu, Toshiyuki Imamura, Performance Evaluation of a Toolkit for Sparse Tensor Decomposition, The 27th International Symposium on High-performance Parallel and Distributed Computing, Tempe, Arizona, USA, June 13, 2018.

Oral Presentations

Haruka Yamada, Akira Imakura, Toshiyuki Imamura, Tetsuya Sakurai, Time-efficient tensor reordering procedures for HOTRG in distributed parallel environment, Tensor Network States: Algorithms and Applications (TNSAA), 神戸, December 3, 2018.
今村俊幸, EigenExaへのオンライン自動チューニング活用の試みについて, 日本応用数理学会2018年会, 名古屋大学東山キャンパス, September 5, 2018 (in Japanese).
山田悠加, 今倉暁, 今村俊幸, 櫻井鉄也, n次元モデル向けHOTRGの分散並列計算における配列の並び替えの最適化, 日本応用数理学会「行列・固有値問題の解法とその応用」研究部会　第25回研究会, September 3-5, 2018 (in Japanese).
Tan Yiyu, Toshiyuki Imamura, Performance Evaluation and Tuning of an OpenCL based Matrix Multiplier, The 24th International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, USA, July 31, 2018.
Toshiyuki Imamura, Development of a Dense Eigenvalue Solver for Exa-Scale Systems, "MolSSI Workshop / ELSI Conference: Solving or Circumventing Eigenvalue Problems in Electronic Structure Theory", Richmond, USA, July 30, 2018.
Toshiyuki Imamura, Y. Idomura, T. Ina, S. Yamashita, N. Onodera, A. Yussuf, S. Yamada, Development of exascalematrix solver based on communication avoiding algorithms, 4th US-Japan Joint Institute for Fusion Theory Workshop on innovations and co-designs of fusion simulations towards extreme scale computing, Princeton, USA, July 30, 2018.
工藤周平, 今村俊幸, 小・中規模固有値計算の高性能実装, データ駆動科学と高速計算科学, 東京, July 17, 2018, (in Japanese).
Toshiyuki Imamura, Communication-Avoiding approaches of dense Eigenvalue / SVD problems, 10th International Workshop on Parallel Matrix Algorithms and Applications, Zurich, Switzerland, June 27, 2018.
Toshiyuki Imamura, Inge Gutheil, Project Report: HPC libraries for solving dense symmetric eigenvalue problems, 8th JLESC workshop, Barcelona Supercomputing Center, Barcelona, Spain, April 17, 2018.
Toshiyuki Imamura, Communication avoiding approach for reducing to tri-diagonal, bi-diagonal, and Hessenberg forms, SIAM Parallel Processing (SIAM PP2018), Tokyo, Japan, March 7-10, 2018 (in Japanese).
Tan Yiyu, Toshiyuki Imamura, The SPLATT Toolkit on the K Computer, SIAM Conference on Parallel Processing for Scientific Computing, Tokyo, Japan, March 7, 2018.
Toshiyuki Imamura, Overview of the EigenExa project, past, present and future, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA 2018), Tsukuba, Japan, March 5-6, 2018
廣田悠輔, 今村俊幸, メニーコア環境における高性能分割統治法ソルバの研究, 科研費基盤B課題「O(1億)コア環境におけるスケーラブルな数値計算ソフトウェアの理論と応用」ワークショップ, 札幌市, January 23, 2018 (in Japanese).

Journal Articles

園田大二郎, 大井祥栄, 龍野智哉, 2次元渦度方程式へのPerfectly Matched Layerに基づく仮想吸収層の実装, 日本応用数理学会論文誌, Vol. 27, No. 2, pp. 84-111, June 25, 2017, https://doi.org/10.11540/jsiamt.27.2_84 (in Japanese).

Conference Proceedings

Tan Yiyu, Yasushi Inoguchi, Makoto Otani, Yukio Iwaya, Takao Tsuchiya, A Real-Time Sound Field Rendering Processor, Applied Sciences, 2018, 8(1), 35, December 28, 2017, http://doi.org/10.3390/app8010035.
山田進, 今村俊幸, 町田昌彦, LOBPCG法を用いたハバードモデルの厳密対角化：複数固有値に対する省通信ノイマン展開前処理の有効性, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2017-HPC-162, No. 4, pp. 1-6, December 11, 2017, http://id.nii.ac.jp/1001/00184796/ (in Japanese).
Yasuhiro Idomura, Takuya Ina, Akie Mayumi, Susumu Yamada, K. Matsumoto, Yuuichi Asahi, Toshiyuki Imamura, Application of a communication-avoiding generalized minimal residual method to a gyrokinetic five dimensional eulerian code on many core platforms, Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2017, pp. 7:1-7:8, November 10, 2017, http://doi.org/10.1145/3148226.3148234.
Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, Masahiko Machida, Design Towards Modern High Performance LA Library Enabling Heterogeneity and Flexible Data Formats, Parallel Computing is Everywhere, Proc. International Conference on Parallel Computing (ParCo2017), Advances in Parallel Computing, pp. 97-106, Sep. 2017, http://ebooks.iospress.nl/volumearticle/48598.
青木聖陽, 廣田悠輔, 今村俊幸, 横川三津夫, FFTカーネルを用いたKNLでのスケーラビリティに関する調査, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2017-HPC-161, No. 16, pp. 1-7, September 1, 2017, http://id.nii.ac.jp/1001/00183483/ (in Japanese).
Susumu Yamada, Toshiyuki Imamura, Takuya Ina, Narimasa Sasa, Yasuhiro Idomura, Masahiko Machida, Quadruple-precision BLAS using Bailey's arithmetic with FMA instruction: Its performance and applications, Proc. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1418-1425, July 10, 2017,, http://doi.org/10.1109/IPDPSW.2017.42
椋木大地, 今村俊幸, 京コンピュータにおける2.5次元アルゴリズムを用いた分散並列行列積の実装と評価, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2017-HPC-159, No. 1, pp. 1-6, April 10, 2017, http://id.nii.ac.jp/1001/00178519/ (in Japanese)
廣田悠輔, 今村俊幸, メニーコアプロセッサ向け分割統治法の実装技術, 情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC）, Vol. 2017-HPC-158, No. 20, pp. 1-9, March 1, 2017, http://id.nii.ac.jp/1001/00177837/

Poster Presentations

Tan Yiyu, Toshiyuki Imamura, An Energy-Efficient FPGA-based Matrix Multiplier, The 24th IEEE International Conference on Electronics, Circuits and Systems, Batumi, Georgia, December 6, 2017.
Yusuke Hirota, Toshiyuki Imamura, Development of Banded Eigenvalue Solvers for Shared Memory Parallel Computers, The 7th AICS International Symposium, Kobe, Japan, February 23, 2017.

Oral Presentations

椋木大地, 次世代計算機のための数値計算ライブラリの実装技術, 日本応用数理学会三部会連携「応用数理セミナー」, 東京都, December 26, 2017 (in Japanese).
Toshiyuki Imamura, Tetsuzou Usui, Porting and Optimization of Numerical Libraries for Arm SVE, Arm HPC Workshop by RIKEN AICS and Linaro, Tokyo, Japan, December 12, 2017.
Yusuke Hirota, Toshiyuki Imamura, Performance Analysis of a Dense Eigenvalue Solver on the K Computer, The 36th JSST Annual International Conference on Simulation Technology, Tokyo, Japan, October 26, 2017.
Susumu Yamada, Toshiyuki Imamura, Masahiko Machida, Communication avoiding Neumann expansion preconditioner for LOBPCG method: Convergence property of exact diagonalization method for Hubbard model, International Conference on Parallel Computing, Bologna, Italy, September 12, 2017.
Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, Masahiko Machida, Towards Modern High Performance LA Library Enabling Heterogeneity and Flexible Data Formats, International Conference on Parallel Computing, Bologna, Italy, September 12, 2017.
真弓明恵, 井戸村泰宏, 伊奈拓也, 山田進, 今村俊幸, 多相流体コードJUPITERにおける前処理付きChebyshev基底CG法ソルバの収束特性評価, 日本原子力学会2017年秋の大会, 札幌市, September 1, 2017 (in Japanese).
大井祥栄, Parareal手法を用いた時間並列計算の性能評価, 第46回数値解析シンポジウム (NAS2017), 高島市, June 28-30, 2017 (in Japanese).
廣田悠輔, 今村俊幸, 四倍精度固有値ソルバの性能分析およびポストムーア時代の倍精度固有値ソルバの性能予測, 第5回大規模並列数値計算技術に関する研究集会, 神戸市, March 28, 2017 (in Japanese).
Yusuke Hirota, Susumu Yamada, Toshiyuki Imamura, Narimasa Sasa, Yasuhiro Idomura, Takuya Ina and Masahiko Machida, Performance of the Quadruple Precision Eigensolver Library QPEigenK on Supercomputer Systems, SIAM Conference on Computational Science and Engineering 2017 (SIAM CSE17), Atlanta, USA, February 27–March 3, 2017.
Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi, Implementation Techniques for High Performance BLAS Kernels on Modern GPUs, SIAM Conference on Computational Science and Engineering (CSE17), Atlanta, USA, February 28, 2017.
Toshiyuki Imamura, Yusuke Hirota, Susumu Yamada and Masahiko Machida, Communication Avoiding and Synchronous Reducing Techniques for Dense Parallel Eigenvalue Solver, SIAM Conference on Computational Science and Engineering (CSE17), Atlanta, USA, February 27–March 3, 2017.
Toshiyuki Imamura, Acceleration of the EigenG solver on a consumer-ranged GPU, 2017 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing, Taipei, Taiwan, March 2017.
真弓明恵，井戸村泰宏，伊奈拓哉，山田進，今村俊幸, 多相流体コードJUPITERにおける前処理付き省通信CG法ソルバの開発, 日本原子力学会2017年春の年会, 東海大湘南キャンパス, March 27-29, 2017 (in Japanese).
椋木大地, 今村俊幸, Reduced-/Extended-precision BLASの実装方法の検討, 第5回大規模並列数値計算技術に関する研究集会, 2017年3月27日 (in Japanese).
廣田悠輔，帯行列固有値問題に対する高性能分割統治法アルゴリズム，ワークショップ「行列計算のための数値計算法」，名古屋大学工学部3号館，名古屋市，2017年1月19–20日 (in Japanese).

2016

Journal Articles | 学術論文

今村俊幸, コンピュータを用いた大規模な固有値計算, 数理科学 2016年12月号, No.642, p.6-12, 2016.12.1.
Narimasa Sasa, Susumu Yamada, Masahiko Machida, and Toshiyuki Imamura, Accumulated Error in Iterative Use of FFT, Nonlinear Theory and Its Applications, IEICS, Vol. 7 (2016) No. 3 pp. 354-361, 2016.7.1.
Yosuke Kumagai, Akihiro Fujii, Teruo Tanaka, Yusuke Hirota, Takeshi Fukaya, Toshiyuki Imamura and Reiji Suda, Performance Analysis of the Chebyshev Basis Conjugate Gradient Method on the K Computer, Proc. PPAM2015, Lecture Notes in Computer Science (LNCS), Vol. 9573, pp. 74–85, 2016.
Susumu Yamada, Toshiyuki Imamura and Masahiko MACHIDA, High Performance Eigenvalue Solver in Exact-diagonalization Method for Hubbard Model on CUDA GPU, Proc. ParCo2015, Advances in Parallel Computing, Vol. 27: Parallel Computing: On the Road to Exascale, pp.361-369, IO-Press. 2016.4.
Toshiyuki Imamura, Takeshi Fukaya, Yusuke Hirota, Susumu Yamada and Masahiko Machida, CAHTR: Communication-Avoiding Householder TRidiagonalization, Proc. ParCo2015, Advances in Parallel Computing, Vol. 27: Parallel Computing: On the Road to Exascale, pp. 381–390, 2016.

Presentations at International Conference | 国際会議発表

Toshiyuki Imamura, Tetsuya Sakura, Yasunori Futamura, Experiences on K computer from a topic focused on the large-scale eigenvalue solver project, 24th Workshop on Sustained Simulation Performance, HLRS, Stuttgart, Germany, December 6th, 2016.
Toshiyuki Imamura, Large-scale eigenvalue computation for dense matrices on the K computer, 1st International Symposium on Research and Education of Computational Science (RECS), U. Tokyo, 2016.11.29 (invited talk)
A. Mayumi, Y. Idomura, T.Ina, S.Yamada, T.Imamura, Left-Preconditioned Communication-Avoiding Conjugate Gradient Methods for Multiphase CFD, Simulations on the K Computer, proceedings of ScalA’16, IEEE. Vol. 7 (2016) No. 3 pp. 354-361, November 13, 2016.
Daichi Mukunoki and Toshiyuki Imamura: Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation, Proc. IEEE International Conference on Cluster Computing (Cluster 2016), pp. 144-145, Sep. 2016.
Toshiyuki Imamura, Parallel dense eigenvalue solver and SVD solver for post-petascale computing systems, The 9th International Workshop on Parallel Matrix Algorithms and Applications (PMAA16), Bordeaux, France, 2016.7.7.
Yusuke Hirota and Toshiyuki Imamura, Performance Analysis of the Quadruple Precision Eigensolver Library QPEigenK on the K Computer, 9th International Workshop on Parallel Matrix Algorithms and Applications (PMAA16), Place de la Victoire, Bordeaux, France, July 6–8, 2016.
Yusuke Hirota, Susumu Yamada, Toshiyuki Imamura, Narimasa Sasa and Masahiko Machida, Performance of Quadruple Precision Eigenvalue Solver Libraries QPEigenK and QPEigenG on the K Computer, HPC in Asia Poster, in conjunction with International Supercomputing Conference (ISC’16), Messe Frankfurt, Frankfurt, Germany, June 19–23, 2016 (poster and ceremony talk) [HPC in Asia Poster Award].
Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi: Performance Evaluation of Verified Computation for Linear Systems on Supercomputer, SIAM: East Asian Section Conference (EASIAM 2016), University of Macau, Jun. 20-22, 2016.
Toshiyuki Imamura, Auto-Tuning for Eigenvalue Solver on the Post Moore’s Era, SIAM Conference on Parallel Processing for Scientific Computing (PP16), Paris, France, 2016.4.16.
Takeshi Fukaya, Toshiyuki Imamura, An Impact of Tuning the Kernel of the Structured QR Factorization in the TSQR, SIAM Conference on Parallel Processing for Scientific Computing (PP16), Paris, France, 2016.4.15.
Susumu Yamada, Toshiyuki Imamura, Masahiko Machida, High performance eigenvalue solver for Hubbard model on CPU-GPU hybrid platform, SIAM Conference on Parallel Processing for Scientific Computing (PP16), Paris, France, 2016.4.13.
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384, Sep. 2016.
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA, 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016), Mathematics Research Center, National Taiwan University, Taipei, Feb. 19, 2016 (Invited).
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS, The 6th AICS International Symposium, Feb. 2016.

Presentations at Domestic Conference | 国内学会発表

今村俊幸, 非同期的な数学的アルゴリズムのソフトウェアの可能性, 第8回自動チューニング技術の現状と応用に関するシンポジウム(ATTA2016), 東京大学, 2016.12.25
今村俊幸, 椋木大地: コンシューマレンジGPUに最適化した固有値ソルバーの実装と評価, 情報処理学会研究報, Vol. 2016-HPC-157, No. 7, 2016年12月.
森倉悠介, 椋木大地, 深谷猛, 山中脩也, 大石進一: 大規模並列計算機における連立1次方程式の精度保証付き数値計算に対する性能評価, 情報処理学会研究報告, Vol. 2016-HPC-157, No. 1, 2016年12月.
大井祥栄, 時間並列計算手法に関する研究開発動向の調査について, 平成28年度自動チューニング研究会マイクロワークショップ, 登別温泉第一滝本館, 北海道登別市, 2016/10/31.
廣田悠輔，今村俊幸，メニーコアCPUにおける割統治法ルーチンの性能評価，平成28年度自動チューニング研究会マイクロワークショップ，第一滝本館，登別市，2016年10月30–31日．
椋木大地, 今村俊幸, 高橋大介: PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討, GTC Japan 2016, 2016年10月.
大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸: KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-, 応用数理学会2016年度年会, 2016年9月.
永井佑紀, 篠原康，山田進、二村保徳，今村俊幸、櫻井鉄也, 動的平均場理論に対する、LOBPCG法とshifted　COCG法を用いた厳密対角化ソルバー,日本物理学会・2016年秋季大会, 金沢大学角間キャンパス, 2016.9.13.
廣田悠輔，山田進，今村俊幸，佐々成正，町田昌彦, 4倍精度固有値ソルバライブラリQPEigenKの京コンピュータにおける性能分析，日本応用数理学会 2016年度年会，北九州国際会議場，北九州市，2016年9月12–14日．
廣田悠輔，今村俊幸，4倍精度固有値ソルバの京コンピュータにおける性能分析，研究集会「応用可積分系の進展」，しあわせの村，神戸市，2016年8月28–30日．
折居茂夫, 今村俊幸, 山本義郎, モデルパラメータに非負制約を課した回帰モデルによる大規模並列計算の性能予測,情報処理学会研究報告 2016-HPC-155, 19, 2016.8.
大井祥栄, 3次元Meshless Time-Domain Methodの高速化, 第45回数値解析シンポジウム (NAS2016), 霧島ホテル, 鹿児島県霧島市, 2016/06/08.
園田大二郎, 大井祥栄, 龍野智哉, 渦度方程式への仮想吸収層への実装, 第45回数値解析シンポジウム (NAS2016), 霧島ホテル, 鹿児島県霧島市, 2016/06/08.
Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka and Shin’ichi Oishi: Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers, 2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016), Jan. 2016.

2015

Journal Articles | 学術論文

廣田悠輔，今村俊幸，帯行列の一般化固有値問題向け分割統治法，情報処理学会論文誌コンピューティングシステム（ACS），Vol. 8，No. 4, pp. 78–87，2015．
Yoshiharu Ohi, Soichiro Ikuno, Numerical Investigation of Electromagnetic Wave Propagation Phenomena by Three-Dimensional Meshless Time-Domain Method, Plasma and Fusion Research, Vol. 10, 3406072, 2015/03/09. (in English)

Presentations at International Conference | 国際会議発表

Yoshiharu Ohi, Soichiro Ikuno, Stability analysis on nodes arrangement in Meshless Time-Domain Method, 25th International TOKI Conference (ITC25), Ceratopia Toki, Toki, Gifu, 2015/11/04. (in English)
Yusuke Hirota and Toshiyuki Imamura, Divide-and-Conquer Method for Symmetric-Definite Generalized Eigenvalue Problems of Banded Matrices on Manycore Systems, SIAM Conference on Applied Linear Algebra 2015 (SIAM LA15), Hyatt Regency Atlanta, Atlanta, USA, October 26–30, 2015.
Yoshiharu Ohi, Soichiro Ikuno, Numerical investigation on Electromagnetic wave propagation phenomena using meshless time domain method in complex shaped domain, The 17th International Symposium on Applied Electrogagnetics and Mechanics, Awaji-yumebutai, Awaji, Hyogo, 2015/09/17. (in English)
Yusuke Hirota and Toshiyuki Imamura, Performance of Divide-and-Conquer Method for Symmetric-Definite Generalized Eigenvalue Problems of Banded Matrices on Multicore and Manycore Systems, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA) 2015, International Congress Center Epochal Tsukuba, Tsukuba, Japan, September 14–16, 2015 (poster).
Toshiyuki Imamura, “The EigenExa library: dense eigenvalue solver for post-petascale computing”, SSExa2015, Greifswald, Germany, March 22-25, 2015.（ポスター発表）
Yoshiharu Ohi, Soichiro Ikuno, Numerical Investigation of Influence of Node Alignment on Stable Calculation for Meshless Time Domain Method, SIAM CSE2015, SaltLakeCity, USA, 2015/03/16. (in English)
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations, GPU Technology Conference (GTC 2015), Mar. 2015.
Takeshi Fukaya and Toshiyuki Imamura, “Performance Evaluation of EigenExa Dense Eigensolver on the Oakleaf-Fx Supercomputer System”, SIAM Conference on Computational Science and Engineering (CSE15), Salt Lake City, USA, March 14, 2015.
Daichi Mukunoki, Toshiyuki Imamura and Daisuke Takahashi: Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650, Mar. 2015.
Toshiyuki Imamura, “Automatic-tuning for CUDA-BLAS kernels parameter by multi-stage d-Spline”, 2015 Conference on Advanced Topics and Auto Tuning in High-Performance and Scientific Computing (2015 ATAT in HPSC), Taipei, Taiwan, February 27-28, 2015.
Takeshi Fukaya, “CholeskyQR2: an algorithm of the Cholesky QR factorization with reorthogonalization”, 2015 Conference on Advanced Topics and Auto Tuning in High-Performance and Scientific Computing (2015 ATAT in HPSC), Taipei, Taiwan, February 27-28, 2015.

Presentations at Domestic Conference | 国内学会発表

椋木大地, 今村俊幸: 短尺浮動小数点形式の検討, 情報処理学会研究報告, Vol. 2015-HPC-152, No. 4, 2015年12月.
大井祥栄, 時間並列計算 -Parareal in time algorithm-, AT研究会マイクロワークショップ, 山梨県甲府市, 2015/10/19.
廣田悠輔，今村俊幸，非同期アルゴリズムの類型とメニーコアプロセッサ向け同期削減技術の開発，平成27年度自動チューニング研究会マイクロワークショップ， KKR 甲府ニュー芙蓉，甲府市，2015年10月18–19日．
椋木大地, 今村俊幸, 高橋大介: GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価, GTC Japan 2015, 2015年9月.
大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸: 京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装, 応用数理学会2015年度年会, 2015年9月.
佐々成正, 山田進, 町田昌彦, 椋木大地, 今村俊幸: FFTを使った時間発展問題における累積誤差, 応用数理学会2015年度年会講演論文集, 2015年9月.
今村俊幸, 椋木大地, 山田進, 町田昌彦: SYMV・GEMVルーチン群のマルチGPU化とその評価, 情報処理学会研究報告, Vol. 2015-HPC-151, No. 13, 2015年9月.
佐々木信一, 菱沼利彰, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸: 京・FX10における倍々精度演算の高速化, 情報処理学会研究報告, Vol. 2015-HPC-151, No. 15, 2015年9月.
椋木大地, 今村俊幸, 高橋大介: NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法, 情報処理学会研究報告, Vol. 2015-HPC-150, No. 13, 2015年7月.
椋木大地, 今村俊幸, 高橋大介: NVIDIA GPUにおけるGEMVカーネルの自動チューニング, 計算工学講演会論文集, Vol. 20, E-2-1, 2015年6月.
廣田悠輔，固有値ソルバの現状とポストペタスケール環境に向けた展望， Cyber HPC Symposium，大阪大学（吹田キャンパス），大阪府吹田市，2015年3月20日．
今村俊幸, 椋木大地: CUDA-BLAS等の選択による最速GPU固有値ソルバーの性能評価, 情報処理学会研究報告, Vol. 2015-HPC-148, No. 4, 2015年2月.
大井祥栄, 藤田宜久, 伊東拓, 生野壮一郎, 有限要素に基づく節点配置を用いたMTDMの性能評価, 2014年度【プラズマ壁相互作用に関する新規シミュレーション手法開発に関する研究会】第1回非線形・可視化部門研究会, 核融合科学研究所, 岐阜県土岐市, 2015/01/27.
Takeshi Fukaya and Toshiyuki Imamura, “Performance evaluation of the EigenExa eigensolver on the Oakleaf-FX supercomputing system”, Annual Meeting on Advanced Computing System and Infrastructure (ACSI 2015), Tsukuba, Japan, January 27, 2015 (reviewed).
今村俊幸, 椋木大地, 佐々成正, 山田進, 町田昌彦: 疑似四倍精度拡張数学パッケージQP-Pack, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月.
佐々木信一, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸: スーパコンピュータ京における倍々精度演算の高速化, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月.
椋木大地, 今村俊幸, 高橋大介: Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月.

2014

Journal Articles | 学術論文

Yoshiharu Ohi, Yoshihisa Fujita, Taku Itoh, Hiroaki Nakamura, Soichiro Ikuno, Faster Generation of Shape Functions in Meshless Time Domain Method, Plasma and Fusion Research, Vol. 9 3401144, 2014/09/03. (in English)
Takemasa Miyoshi, Keiichi Kondo, and Toshiyuki Imamura, “The 10,240-member ensemble Kalman filtering with an intermediate AGCM”, Geophysical Research Letters, Vol. 41, July 28 2014.
Teruo Tanaka, Ryo Otsuka, Akihiro Fujii, Takahiro Katagiri, and Toshiyuki Imamura, “Implementation of d-Spline-based incremental performance parameter estimation method with ppOpen-AT”, Scientific Programming, Vol. 22, No. 4, pp. 299-307, 2014.
Yasuhiro Idomura, Motoki Nakata, Susumu Yamada, Masahiko Machida, Toshiyuki Imamura, Tomohiko Watanabe, Masanori Nunami, Hikaru Inoue, Shigenobu Tsutsumi, Ikuo Miyoshi, and Naoyuki Shida, “Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer”, International Journal of High Performance Computing Applications, Vol. 28, No. 1, pp. 73-86, 2014.
T. Imamura, S. Yamada and M. Machida, “Eigen-G: GPU-based eigenvalue solver for real-symmetric dense matrices”, 10th International Conference on Parallel Processing and Applied Mathematics (PPAM2013), Lecture Note in Computer Science (LNCS) 8384, pp. 673–682, 2014.

Presentations at International Conference | 国際会議発表

Yusuke Hirota and Toshiyuki Imamura, Development of High Performance Parallel Real Random Number Generator KMATH_RANDOM, The 5th AICS International Symposium, RIKEN AICS, Kobe, Japan, December 8–9, 2014 (poster).
Takeshi Fukaya, and Toshiyuki Imamura, “Performance evaluation of the EigenExa dense eigensolver on the K computer”, the 5th AICS International Symposium, Kobe, Japan, December 8-9, 2014.（ポスター発表）
Takeshi Fukaya, “Modeling the performance of parallel dense eigensolvers on peta/post-petascale systems”, JST/CREST International Symposium on Post Petascale System Software (ISP2S2), Kobe, Japan, December 2, 2014.（ポスター発表）
Takeshi Fukaya, Yuji Nakatsukasa, Yuka Yanagisawa, and Yusaku Yamamoto, “CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System”, Proc. the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp.31-38, New Orleans, USA, November 17, 2014.
Yoshiharu Ohi, Soichiro Ikuno, Numerical Investigation of Electromagnetic Wave Propagation Phenomena using 3D Meshless Time Domain Method, The 24th International Toki Conference (ITC-24), Ceratopia Toki, Toki, Gifu, 2014/11/06. (in English)
Yoshiharu Ohi, Soichiro Ikuno, Influence of Weight Function to Numerical Precision in 3D Meshless Time Domain Method, The 33rd JSST Annual Conference: International Conference on Simulation Technology (JSST2014), Kitakyushu International Conference Center, Kitakyushu-shi, Fukuoka, 2014/10/29. (in English)
Yusuke Hirota and Toshiyuki Imamura, Acceleration of Divide and Conquer Method for Generalized Eigenvalue Problems of Banded Matrices on Manycore Architectures, Parallel Matrix Algorithms and Applications 2014, University of Lugano, Lugano, Switzerland, July 2–4, 2014.
Takeshi Fukaya, Toshiyuki Imamura, and Yusaku Yamamoto, “Performance Analysis of the Householder-type Parallel Tall-Skinny QR Factorizations toward Automatic Algorithm Selection”, Proc. the Ninth International Workshop on Automatic Performance Tuning (iWAPT2014), pp.1-8, Eugene, USA, July 1, 2014.
Toshiyuki Imamura, “The EigenExa Library – High Performance & Scalable Direct Eigensolver for Large-Scale Computational Science”, International Supercomputing Conference (ISC14), HPC in Asia session, June 26, 2014.（招待講演）
Chongke Bi, Kenji Ono, Kwan-Liu Ma, Haiyuan Wu, and Toshiyuki Imamura, “A Study of Parallel Data Compression Using Proper Orthogonal Decomposition on the K Computer”, Proc. Eurographics Symposium on Parallel Graphics and Visualization (EGPGV2014), Swansea, UK, June 9-10, 2014.
Yoshiharu Ohi, Taku Itoh, Soichiro Ikuno, Numerical Investigations of 3D Electromagnetic Wave Propagation Phenomena Using Meshless Time Domain Method, The 16th Biennial IEEE Conference on Electromagnetic Field Computation (CEFC2014), Annecy, France, 2014/05/27. (in English)
T. Imamura, “Automatic-tuning for CUDA-BLAS kernels by Multi-stage d-Spline Pruning Strategy”, 2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2014 ATAT in HPSC), National Taiwan University, Taipei, Taiwan, Mar. 14–15, 2014.
T. Fukaya, “A Communication-Avoiding Algorithm for the Gram-Schmidt Orthogonalization”, 2014 Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing (2014 ATAT in HPSC), National Taiwan University, Taipei, Taiwan, Mar. 14–15, 2014.
Yusuke Hirota and Toshiyuki Imamura, Divide and Conquer Method for Computing Generalized Eigenvalues of Banded Matrices, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA) 2014, International Congress Center Epochal Tsukuba, Tsukuba, Japan, March 7–9, 2014 (poster).
T. Sakurai, S. L. Zhang, T. Imamura, Y. Yamamoto, Y. Kuramashi and T. Hoshi, “CREST project \”Development of an Eigen-Supercomputing Engine using a Post-Petascale Hierarchical Model\””, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA 2014), Epochal Tsukuba, Mar. 7–9, 2014.（ポスター発表）
T. Imamura and Y. Yamamoto, “CREST: Dense Eigen-Engine Groups”, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA 2014), Epochal Tsukuba, Mar. 7–9, 2014.（ポスター発表）
T. Fukaya, Y. Yamamoto and T. Imamura, “An overview of parallel algorithms for tall-skinny QR factorizations”, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA 2014), Epochal Tsukuba, Mar. 7–9, 2014.（ポスター発表）
Y. Yanagisawa, Y. Nakatsukasa and T. Fukaya, “Cholesky-QR and Householder-QR factorizations in nonstandard inner product spaces”, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA 2014), Epochal Tsukuba, Mar. 7–9, 2014.（ポスター発表）
T. Fukaya and Y. Yamamoto, “Auto-tuning Tall and Skinny QR Factorization”, SIAM Conference on Parallel Processing for Scientific Computing (PP14), Portland, USA, Feb. 20, 2014.

Presentations at Domestic Conference | 国内学会発表

Takeshi Fukaya, “The Cholesky QR factorization in high-performance computing”, the 12th Computational Mathematics Conference, Dec. 28, 2014（招待講演）.
椋木大地, 今村俊幸: MaxwellアーキテクチャGPUにおける疑似倍精度演算を用いたDGEMMの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2014-HPC-147, No. 26, 2014年12月.
Toshiyuki Imamura, “Review on large-scale dense eigenvalue computation”, 18th Symposium on Setouchi-rim JSIAM Local Research Group, Kurashiki, Japan, December 6, 2014.
大井祥栄, 生野壮一郎, 任意節点配置を用いたMTDMにおける安定性解析, 第23回MAGDAカンファレンス (MAGDA2014), サンポートホール高松, 香川県高松市, 2014/12/04.
大井祥栄, 生野壮一郎, Meshless Time Domain Methodを用いた電磁場解析における節点配置の計算精度および数値安定性への影響解析, 日本応用数理学会2014年度年会, 政策研究大学院大学, 東京都港区, 2014/10/29.
今村俊幸, 椋木大地, 山田進, 町田昌彦: CUDA-xSYMVの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2014-HPC-146, No. 14, 2014年10月.
廣田悠輔，帯行列の一般化固有値問題向け分割統治法における摂動行列の分解について，神戸大学大学院システム情報学研究科計算科学専攻協定講座第8回協定講座シンポジウム，神戸大学，神戸市，2014年9月11日 (Poster)．
Takeshi Fukaya, “QR factorizations based on the Cholesky factorization”, the 8th Kobe University Cooperative Division Symposium, Kobe Japan, September 11, 2014（ポスター発表）.
Takuma Kawamura, Yasuhiro Idomura, Hiroko Miyamura, Hiroshi Takemiya, and Toshiyuki Imamura, “Remote visualization of large-scale simulation results on K-computer using particle-based volume rendering”, Annual meeting on Atomic Energy Society of Japan, Kyoto, Japan, September 8, 2014.
Yuka Yanagisawa, Yuji Nakatsukasa, Takeshi Fukaya, Yusaku Yamamoto, Shinichi Oishi and Kannan Ramaseshan: Shifted Cholesky QR factorization, 2014 JSIAM Annual meeting, Tokyo Japan, September 4, 2014.
Takeshi Fukaya, Yuji Nakatsukasa, Yuka Yanagisawa and Yusaku Yamamoto: Performance evaluation of the Cholesky QR factorization with reorthogonalization on large-scale parallel systems, 2014 JSIAM Annual meeting, Tokyo, Japan, September 3, 2014.
Takeshi Fukaya, Yusaku Yamamoto and Toshiyuki Imamura: An study on blocking techniques of Householder transformations and related communication-avoiding, Summer united Workshop on Parallel, distributed and cooperative Processing (SWoPP2014), Niigata, Japan, July 28, 2014.
椋木大地, 今村俊幸, 高橋大介: KeplerアーキテクチャGPUにおける高速なSGEMVの実装, GTC Japan 2014, 2014年7月.
Takahiro Katagir, Koichi Takayama, Takashi Yonemura, Hiroki Kumahora, Mitsuyoshi Igai, Junichi Kitagami, Yoshiyuki Eguchi, Takeshi Fukaya, Yusaku Yamamoto, Junichi Iwata, Kazuyuki Uchida, Satoshi Oshima, and Kengo Nakajima: Application of the communication-avoiding algorithm CAQR to the orthogonalization process in RSDFT and its evaluation, IPSJ SIG Tech. Rep. [High Performance Computing], Vol.2014-HPC-144, No.3, pp.1-6, May 19, 2014.
Takeshi Fukaya: Communication-avoiding QR factorizations and related auto-tuning, ATOS9, Tokyo, Japan, May 5, 2014.
廣田悠輔，今村俊幸，一般化固有値問題向け分割統治法のsecular方程式の数値解法について, 日本応用数理学会「行列・固有値の解法とその応用」研究部会 2014年連合発表会，京都大学，2014年3月20日．
田村遼也，今村俊幸、仲谷栄伸: GPUへの完全オフロード化によるTSQRの高速化に関する研究，情報処理学会研究報告，「ハイパフォーマンスコンピューティング（HPC）」，Vol. 2014-HPC-143，No. 21，pp. 1–7，2014年2月24日，第143回情報処理学会ハイパフォーマンスコンピューティング研究会（HPC143），和倉温泉「あえの風」，2014年3月3日–4日.
岡田和人，岡本吉央，今村俊幸: マルチGPU環境におけるCRS形式疎行列・ベクトル積の入力行列の最適化による高速化，ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2014，p. 28，2013年12月31日，ハイパフォーマンスコンピューティングと計算科学シンポジウム（HPCS2014），一橋大学一橋講堂，2014年1月7日–8日（ポスター発表）.
白澤孝仁，今村俊幸，岡本吉央: 村田法のスレッド並列化によるマルチコアCPU上での実対称帯行列帯幅縮小操作の高速化，ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2014，p. 29，2013年12月31日，ハイパフォーマンスコンピューティングと計算科学シンポジウム（HPCS2014），一橋大学一橋講堂，2014年1月7日–8日（ポスター発表）.
林熙龍，今村俊幸，岡本吉央: d-Spline関数を用いたGEMVカーネルの性能チューニング，ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2014，p. 30，2013年12月31日，ハイパフォーマンスコンピューティングと計算科学シンポジウム（HPCS2014），一橋大学一橋講堂，2014年1月7日–8日（ポスター発表）.
黒田明義，大井憲行，井上晃，村井均，山崎隆浩，大野隆央，今村俊幸，南一生: 高次元メッシュ/トーラスネットワークにおける実アプリケーションの通信最適化手法―「京」上のTofuネットワークを例に―，ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2014，pp. 97–105. 30，2013年12月31日，ハイパフォーマンスコンピューティングと計算科学シンポジウム（HPCS2014），一橋大学一橋講堂，2014年1月7日–8日．
Takeshi Fukaya and Toshiyuki Imamura: A study on auto-tuning for the structured QR factorization appearing in the TSQR algorithm, the 19th Annual Meeting of Japan Society for Computational Engineering and Science, Proc. JSCES, Vol.19, 2014.

2013

Journal Articles | 学術論文

T. Imamura, S. Yamada, and M. Machida, “A High Performance SYMV Kernel on a Fermi-core GPU”, High Performance Computing for Computational Science – VECPAR 2012, Lecture Note in Computer Science (LNCS) 7851, pp. 59–71, 2013.

Presentations at International Conference | 国際会議発表

T. Imamura, “Research Activities in AICS towards post Peta-scale Numerical Libraries”, the 4th AICS International Symposium, Kobe, Japan, Dec. 2-3, 2013.
C. Bi, K. Ono, K. L. Ma, H. Wu and T. Imamura, “Proper orthogonal decomposition based parallel compression for visualizing big data on the K computer”, Berk Geveci, Hanspeter Pfister, Venkatram Vishwanath (Eds.): The Proc. LDAV2013, pp. 121–122, 2013, IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV 2013), Atlanta, Georgia, USA, Oct. 13–14, 2013, （ポスター発表，査読有り）
T. Imamura, “Automatic Tuning for GPU BLAS kernels”, Dagstuhl Seminar 13401 “Automatic Application Tuning for HPC Architectures”, Schloss Dagstuhl, Saarbrucken, Germany, Oct., 2013.
T. Imamura, “Performance Auto-Tuning in Memory-Bound CUDA-BLAS Kernel”, 2013 @^2 HPSC, Conference on Advanced Topics and Auto Tuning in High Performance Scientific Computing, National Taiwan University, Taipei, Taiwan, Mar. 27–29, 2013.
T. Imamura, “Beyond Peta-scale Computing from the Viewpoint of Numerical Libraries”, The 3rd AICS International Symposium, Computer and Computational Science for Exascale Computing, RIKEN AICS, Kobe, Japan, Feb. 28–Mar. 1, 2013.
T. Fukaya, T. Imamura and Y. Yamamoto, “Performance Modeling of the Eigen-K Dense Eigensolver on Massively Parallel Machines”, SIAM Conference on Computational Science and Engineering (CSE13), The Westin Boston Waterfront, Boston, Massachusetts, Feb. 25–Mar. 1, 2013.

Presentations at Domestic Conference | 国内学会発表

深谷猛，山本有作，今村俊幸: グラム・シュミットの直交化に基づくTSQRアルゴリズムとその性能評価，日本応用数理学会「行列・固有値問題の解法とその応用」研究部会第16回研究会，2013年12月26日．
今村俊幸: 数値線形計算に現れる通信削減アルゴリズムについて，日本応用数理学会「行列・固有値問題の解法とその応用」研究部会第16回研究会，東京大学，2013年12月26日(招待講演).
今村俊幸: 大規模並列固有値ソルバー～京での現状からエキサまで～，今後のHPC(基盤技術と応用) に関するワークショップ，長崎市図書館，2013年12月8日–9日（基調講演）.
廣田悠輔，一般化固有値問題向け分割統治法とその帯行列向け拡張について，第11回計算数学研究会，国民宿舎ブランナールみささ，鳥取県三朝町，2013年11月2–4日．
深谷猛，山本有作，今村俊幸: 大規模並列環境における縦長行列のQR分解の性能評価，第11回計算数学研究会，ブランナールみささ，鳥取県三朝町，2013年11月3日–4日．
大井祥栄, 龍野智哉, 生野壮一郎, RPIMより得られる形状関数を用いた電磁場解析における安定性の数値的検証, 日本応用数理学会2013年度年会, アクロス福岡(福岡県福岡市), 2013/9/11.
深谷猛，今村俊幸，山本有作: 京コンピュータにおける対称密行列向け固有値計算プログラムの性能評価と性能予測，日本応用数理学会 2013年度年会，アクロス福岡，2013年9月9日–11日．
廣田悠輔，今村俊幸，帯行列の一般化固有値問題向け分割統治法，並列/分散/協調処理に関するサマー・ワークショップ（SWoPP 2013）日本応用数理学会「行列・固有値の解法とその応用」研究部会，北九州国際会議場，2013年7月31日–8月2日．
今村俊幸: Roadmap to Eigensolver on a GPU-cluster， GTC Japan 2013，テクニカルセッション（東京工業大学GSIC GPUコンピューティング研究会），東京ミッドタウン，2013年7月30日(招待講演) (in Japanese).
深谷猛，今村俊幸，山本有作: 超並列環境における密行列計算プログラムの性能モデリングに向けた検討，情報処理学会研究報告，「ハイパフォーマンスコンピューティング（HPC）」，Vol. 2013-HPC-140，No. 41，pp. 1–8，2013年7月24日， 2013年並列／分散／協調処理に関する『北九州』サマー・ワークショップ（SWoPP北九州2013），北九州国際会議場，2013年7月31日–8月2日．
今村俊幸，山田進，町田昌彦: O(メガ)コア級超並列固有値ソルバの自動チューニングによる戦略，第18回計算工学会講演会，東京大学生産技術研究所，2013年6月19日–21日，計算工学会論文集（CD-ROM），Vol. 18，D-13-5，2013.
佐々成正，山田進，町田昌彦，今村俊幸，奥田洋司: QPBLAS-GPUの開発と性能評価，第18回計算工学会講演会，東京大学生産技術研究所，2013年6月19日–21日，計算工学会論文集（CD-ROM），Vol. 18，D-13-5，2013．
今村俊幸: 第三世代NVIDIA GPUを用いた高性能固有値ソルバの開発，「コンピューティクスによる物質デザイン: 複合相関と非平衡ダイナミックス」，平成25年度第2回研究会，東京大学，2013年3月11日．
今村俊幸，内海貴弘，林熙龍，山田進，町田昌彦: Fermi, Kepler複数世代GPUに対するSYMVカーネルの性能チューニング，情報処理学会研究報告，「ハイパフォーマンスコンピューティング（HPC）」，Vol. 2013-HPC-138，No. 7，pp. 1–7，2013年2月14日，第138回ハイパフォーマンスコンピューティング研究発表会（HPC138），芦原温泉清風荘，2013年2月21-22日．

2012

Presentations at International Conference | 国際会議発表

T. Imamura, S. Yamada and M. Machida, “Eigen-K: high performance eigenvalue solver for symmetric matrices developed for K computer”, PMAA2012, London, UK, Jun., 2012.
T. Imamura, S. Yamada and M. Machida, “Preliminary Report for a High Precision Distributed Memory Parallel Eigenvalue Solver”, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), Salt Lake City, USA, 2012.（ポスター発表，査読有り）
Y. Idomura, M. Nakata, S. Yamada and T. Imamura, T. Watanabe, M. Machida, M. Nunami, H. Inoue, S. Tsutsumi, I. Miyoshi, and N. Shida, “Communication Overlap Techniques for Improved Strong Scaling of Gyrokinetic Eulerian Code Beyond 100k Cores on K-Computer”, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), Salt Lake City, USA, 2012.（ポスター発表，査読有り）

Presentations at Domestic Conference | 国内学会発表

山田進，佐々成正，今村俊幸，町田昌彦: 4倍精度基本線形代数ルーチン群QPBLASの紹介とアプリケーションへの応用，情報処理学会研究報告，「ハイパフォーマンスコンピューティング（HPC）」，Vol. 2012-HPC-137，No. 23，pp. 1–6，2012年12月6日，第194回計算機アーキテクチャ・第137回ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-20），北海道大学情報基盤センター，2012年12月14日．
今村俊幸: 「京」における数値計算ソフトウェア整備について，「産業における応用数理」研究会，日本応用数理学会，筑波大学東京キャンパス，2012年12月4日．
今村俊幸，山田進，町田昌彦: ポスト・ペタスケール時代の密固有値計算ソルバについて，日本応用数理学会2012年度年会講演予稿集，pp. 279–280，日本応用数理学会2012年度年会，稚内全日空ホテル，2012年8月31日.
今村俊幸，吉田剛啓，田村遼也，近藤大貴，山田進，町田昌彦: マルチコアを考慮した通信隠蔽手法の自動チューニング機能付き高性能固有値ソルバの開発，情報処理学会研究報告，「ハイパフォーマンスコンピューティング（HPC）」， Vol. 2012-HPC-135，No. 19，pp. 1–8，2012年7月25日， 2012年並列／分散／協調処理に関する『鳥取』サマー・ワークショップ（SWoPP鳥取2012），とりぎん文化会舘，2012年7月25日.
今村俊幸, 山田進, 町田昌彦: Eigen-Exa:ポストペタスケール環境での密行列固有値ソルバー開発, 第17回計算工学会講演会，京都教育文化センター, 2012年5月30日.

Publications | 研究業績

Journal Articles

Conference Proceedings

Preprint

Journal Articles

Conference Proceedings

Conference Proceedings

Journal Articles

Conference Proceedings

Journal Articles

Conference Proceedings

Conference Proceedings

Poster Presentations

Oral Presentations

Other Articles

Conference Proceedings

Poster Presentations

Oral Presentations

Journal Articles

Conference Proceedings

Poster Presentations

Oral Presentations

Other Articles

Journal Articles

Conference Proceedings

Poster Presentations

Oral Presentations

Journal Articles | 学術論文

Presentations at International Conference | 国際会議発表

Presentations at Domestic Conference | 国内学会発表

Journal Articles | 学術論文

Presentations at International Conference | 国際会議発表

Presentations at Domestic Conference | 国内学会発表

Journal Articles | 学術論文

Presentations at International Conference | 国際会議発表

Presentations at Domestic Conference | 国内学会発表

Journal Articles | 学術論文

Presentations at International Conference | 国際会議発表

Presentations at Domestic Conference | 国内学会発表

Presentations at International Conference | 国際会議発表

Presentations at Domestic Conference | 国内学会発表