Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization

Navigating the Landscape of Enormous AI Model Training

Yang You

[ ]
Sat 16 Dec 8:30 a.m. PST — 9 a.m. PST

Abstract:

Abstract: The proliferation of large models based on Transformers has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this increasing need, the absence of established best practices for selecting an optimal strategy persists, owing to the extensive expertise required in High-Performance Computing (HPC), Deep Learning (DL), and distributed systems. These challenges have motivated both AI and HPC developers to delve into pivotal questions: How can the training and inference efficiency of large models be enhanced to minimize costs? How can larger AI models be accommodated, even with limited resources? What measures can be taken to facilitate broader community access to large models and large-scale applications? In this talk, I will discuss potential solutions to these challenges by exploring hybrid parallelisms, heterogeneous memory management, and the design of user-friendly frameworks such as our open-source systemic solution: Colossal-AI (https://github.com/hpcaitech/ColossalAI).

Speaker's Bio: Yang You is a Presidential Young Professor at the National University of Singapore. He received his Ph.D. in Computer Science from UC Berkeley under Prof. James Demmel. Yang's research interests include Parallel/Distributed Algorithms, High Performance Computing, and Machine Learning. He is a winner of the IPDPS 2015 Best Paper Award (0.8%), ICPP 2018 Best Paper Award (0.3%), and ACM/IEEE George Michael HPC Fellowship. Yang is also a Siebel Scholar and a winner of the Lotfi A. Zadeh Prize. He also made the Forbes 30 Under 30 Asia list (2021) for young leaders and the IEEE-CS TCHPC early career award.

Chat is not available.