トップページ    イベント・広報    R-CCS Cafe    R-CCS Cafe 特別版(2024年6月10日)

詳細
開催日 2024年6月10日(月)
開催時間 14:50 - 17:20
開催都市 兵庫県神戸市/オンライン
場所

計算科学研究センター(R-CCS)6階講堂/Zoomによる遠隔セミナー

使用言語 発表・スライド共に英語
登壇者

Miquel Pericas

Associate Professor, Chalmers University

Ivan R. Ivanov

高性能計算モデリング研究チーム
大学院生リサーチ・アソシエイト

講演題目・要旨

1st Speaker: Miquel Pericas


Title:
RISC-V for European Supercomputing: Challenges and Opportunities
Abstract:
Europe's push for an open-source, independent supercomputing ecosystem hinges on maximizing the potential of RISC-V for High-Performance Computing (HPC). While RISC-V's flexibility and customizability are promising, extracting peak performance from its Vector (RVV) extension remains a hurdle. Similar to ARM-SVE, RVV utilizes a vector length agnostic architecture, which presents shared performance optimization challenges.

2nd Speaker: Ivan R. Ivanov


Title:
Retargeting and Respecializing GPU Workloads for Performance Portability
Abstract:
In order to come close to peak performance, accelerators like GPUs require significant architecture-specific tuning that understand the availability of shared memory, parallelism, tensor cores, etc. Unfortunately, the pursuit of higher performance and lower costs have led to a significant diversification of architecture designs, even from the same vendor. This creates the need for performance portability across different GPUs, especially important for programs in a particular programming model with a certain architecture in mind. Even when the program can be seamlessly executed on a different architecture, it may suffer a performance penalty due to it not being sized appropriately to the available hardware resources such as fast memory and registers, let alone not using newer advanced features of the architecture. We propose a new approach to improving performance of (legacy) CUDA programs for modern machines by automatically adjusting the amount of work each parallel thread does, and the amount of memory and register resources it requires. By operating within the MLIR compiler infrastructure, we are able to also target AMD GPUs by performing automatic translation from CUDA and simultaneously adjust the program granularity to fit the size of target GPUs. Combined with autotuning assisted by the platform-specific compiler, our approach demonstrates 27% geomean speedup on the Rodinia benchmark suite over baseline CUDA implementation as well as performance parity between similar NVIDIA and AMD GPUs executing the same CUDA program.

注意事項

  • 参加の際はPCマイクの音声・ビデオをオフにされるようお願いいたします。
  • 当日の会場環境や通信状態により、やむなく配信を中止・中断する場合がございます。
  • プログラムの内容、時間は予告なく変更される場合があります。
  • ご使用の機器やネットワークの環境によっては、ご視聴いただけない場合がございます。
  • インターネット中継に関する著作権は、主催者及び発表者に帰属します。なお、配信された映像及び音声、若しくはその内容を、理化学研究所の許可無くほかのウェブサイトや著作物等への転載、複製、改変等を行うことを禁じます。

(2024年5月31日)