TOP    Events & Outreach    R-CCS Cafe    R-CCS Cafe - Special Edition (Jun 10, 2024)

Details
Date Mon, Jun 10, 2024
Time 2:50 pm - 5:20 pm
City Kobe, Japan/Online
Place

Lecture Hall (6th floor) at R-CCS and Online seminar on Zoom

  • If you are not affiliated with R-CCS and would like to attend R-CCS Cafe, please email us at r-ccs-cafe[at]ml.riken.jp.
Language Presentation Language: English
Presentation Material: English
Speakers

Miquel Pericas

Chalmers University
Associate Professor

Ivan R. Ivanov

SPR team, RIKEN R-CCS
Junior Research Associate

Talk Titles and Abstracts

1st Speaker: Miquel Pericas

Title:
RISC-V for European Supercomputing: Challenges and Opportunities
Abstract:
Europe's push for an open-source, independent supercomputing ecosystem hinges on maximizing the potential of RISC-V for High-Performance Computing (HPC). While RISC-V's flexibility and customizability are promising, extracting peak performance from its Vector (RVV) extension remains a hurdle. Similar to ARM-SVE, RVV utilizes a vector length agnostic architecture, which presents shared performance optimization challenges.

2nd Speaker: Ivan R. Ivanov

Title:
Retargeting and Respecializing GPU Workloads for Performance Portability
Abstract:
In order to come close to peak performance, accelerators like GPUs require significant architecture-specific tuning that understand the availability of shared memory, parallelism, tensor cores, etc. Unfortunately, the pursuit of higher performance and lower costs have led to a significant diversification of architecture designs, even from the same vendor. This creates the need for performance portability across different GPUs, especially important for programs in a particular programming model with a certain architecture in mind. Even when the program can be seamlessly executed on a different architecture, it may suffer a performance penalty due to it not being sized appropriately to the available hardware resources such as fast memory and registers, let alone not using newer advanced features of the architecture. We propose a new approach to improving performance of (legacy) CUDA programs for modern machines by automatically adjusting the amount of work each parallel thread does, and the amount of memory and register resources it requires. By operating within the MLIR compiler infrastructure, we are able to also target AMD GPUs by performing automatic translation from CUDA and simultaneously adjust the program granularity to fit the size of target GPUs. Combined with autotuning assisted by the platform-specific compiler, our approach demonstrates 27% geomean speedup on the Rodinia benchmark suite over baseline CUDA implementation as well as performance parity between similar NVIDIA and AMD GPUs executing the same CUDA program.

Important Notes

  • Please turn off your video and microphone when you join the meeting.
  • The broadcasting may be interrupted or terminated depending on the network condition or any other unexpected event.
  • The program schedule and contents may be modified without prior notice.
  • Depending on the utilized device and network environment, it may not be able to watch the session.
  • All rights concerning the broadcasted material will belong to the organizer and the presenters, and it is prohibited to copy, modify, or redistribute the total or a part of the broadcasted material without the previous permission of RIKEN.

(May 31, 2024)