TOP    Events & Outreach    R-CCS Cafe    The 243rd R-CCS Cafe - part 1

Title

Challenges of Scaling Deep Learning on HPC Systems

Details
Date Fri, Feb 3, 2023
Time 4:00 pm - 4:20 pm (5 pm - 5:20 pm Discussion, 5:20 pm - Free discussion (optional))
City Online
Place

Online seminar on Zoom

  • If you are not affiliated with R-CCS and would like to attend R-CCS Cafe, please email us at r-ccs-cafe[at]ml.riken.jp.
Language Presentation Language: English
Presentation Material: English
Speakers

Mohamed WAHIB

High Performance Artificial Intelligence Systems Research Team
Team Leader

Abstract

Machine learning, and training deep learning in specific, are becoming one of the main workloads running on HPC systems. More so, the scientific computing community is increasingly adopting modern deep learning approaches to their workflows. When HPC practitioners attempt to scale a typical HPC workload, they are mostly challenged by a particular bottleneck. Scaling deep learning, on the other hand, can be challenged by different bottlenecks: memory capacity, communication, I/O, compute etc. In this talk we give an overview of the bottlenecks in scaling deep learning, and highlight efforts in addressing some of those bottlenecks.

Important Notes

  • Please turn off your video and microphone when you join the meeting.
  • The broadcasting may be interrupted or terminated depending on the network condition or any other unexpected event.
  • The program schedule and contents may be modified without prior notice.
  • Depending on the utilized device and network environment, it may not be able to watch the session.
  • All rights concerning the broadcasted material will belong to the organizer and the presenters, and it is prohibited to copy, modify, or redistribute the total or a part of the broadcasted material without the previous permission of RIKEN.

(Jan 25, 2023)