Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization

Workshop

Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization

Julia Gusak · Jean Kossaifi · Alena Shilova · Rocco Sedona · Cristiana Bentes · Animashree Anandkumar · Olivier Beaumont

Sat 16 Dec, 6:15 a.m. PST

[ Abstract ] Workshop Website

Unlock neural network training's potential for good and science! Enhance computational efficiency, scalability, and resource optimization. Join HPC and AI experts to tackle challenges in theory and applications.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 6:15 a.m. - 6:50 a.m.	Poster Placement	🔗
Sat 6:50 a.m. - 7:00 a.m.	Opening Remarks ( Talk ) > SlidesLive Video	Julia Gusak 🔗
Sat 7:00 a.m. - 7:30 a.m.	A Data-Centric View on Workflows that Couple HPC with Large-Scale Models ( Invited Talk ) > SlidesLive Video	Ana Gainaru 🔗
Sat 7:30 a.m. - 8:00 a.m.	Rematerialization Algorithms for Memory-efficient Learning ( Invited Talk ) > SlidesLive Video	Lionel Eyraud-Dubois 🔗
Sat 8:00 a.m. - 8:30 a.m.	Coffee Break	🔗
Sat 8:30 a.m. - 9:00 a.m.	Navigating the Landscape of Enormous AI Model Training ( Invited Talk ) > SlidesLive Video	Yang You 🔗
Sat 9:00 a.m. - 9:30 a.m.	Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models ( Invited Talk ) > SlidesLive Video	Olatunji Ruwase 🔗
Sat 9:30 a.m. - 10:00 a.m.	Contributed Talks ( Talk ) > link SlidesLive Video Link	🔗
Sat 9:31 a.m. - 9:36 a.m.	Training and inference of large language models using 8-bit floating point ( Contributed Talk & Poster ) > link Link	Sergio Perez · Yan Zhang · James Briggs · Charles Blake · Josh Levy-Kramer · Paul Balanca · Carlo Luschi · Stephen Barlow · Andrew Fitzgibbon 🔗
Sat 9:37 a.m. - 9:42 a.m.	MatFormer: Nested Transformer for Elastic Inference ( Contributed Talk & Poster ) > link Link	11 presenters Fnu Devvrit · Sneha Kudugunta · Aditya Kusupati · Tim Dettmers · Kaifeng Chen · Inderjit Dhillon · Yulia Tsvetkov · Hannaneh Hajishirzi · Sham Kakade · Ali Farhadi · Prateek Jain 🔗
Sat 9:43 a.m. - 9:48 a.m.	Sparse Backpropagation for MoE Training ( Contributed Talk & Poster ) > link Link	Liyuan Liu · Jianfeng Gao · Weizhu Chen 🔗
Sat 9:49 a.m. - 9:54 a.m.	Efficient Parallelization Layouts for Large-Scale Distributed Model Training ( Contributed Talk & Poster ) > link Link	Johannes Hagemann · Samuel Weinbach · Konstantin Dobler · Maximilian Schall · Gerard de Melo 🔗
Sat 9:55 a.m. - 10:00 a.m.	CoTFormer: More Tokens With Attention Make Up For Less Depth ( Contributed Talk & Poster ) > link Link	Amirkeivan Mohtashami · Matteo Pagliardini · Martin Jaggi 🔗
Sat 10:00 a.m. - 11:30 a.m.	Lunch	🔗
Sat 11:30 a.m. - 12:00 p.m.	Poster Session ( Poster Session ) > link Link	🔗
Sat 12:00 p.m. - 12:30 p.m.	Crafting Computational Efficiency for Large Models: Training Recipes, Scaling Strategies and Sparsity Sorcery with Specialized Hardware ( Invited Talk ) > SlidesLive Video	Natalia Vassilieva 🔗
Sat 12:30 p.m. - 1:00 p.m.	Invited Talk by Databricks ( Invited Talk ) > SlidesLive Video	🔗
Sat 1:00 p.m. - 1:30 p.m.	Coffee Break	🔗
Sat 1:30 p.m. - 2:00 p.m.	Efficient LLM Training and Inference on GPUs ( Invited Talk ) > SlidesLive Video	Mohammad Shoeybi · Bryan Catanzaro 🔗
Sat 2:00 p.m. - 2:50 p.m.	Panel Discussion ( Panel ) > SlidesLive Video	Yang You · Olatunji Ruwase · Natalia Vassilieva · Mohammad Shoeybi · Ana Gainaru · Lionel Eyraud-Dubois · Jean Kossaifi 🔗
Sat 2:50 p.m. - 3:00 p.m.	Closing Remarks ( Talk ) > SlidesLive Video	Jean Kossaifi 🔗
Sat 3:00 p.m. - 3:30 p.m.	Poster Session ( Poster Session ) > link Link	🔗
-	AI4HPC: Library to Train AI Models on HPC Systems using CFD Datasets ( Poster ) > link SlidesLive Video Link	Eray Inanc · Rakesh Sarma · Marcel Aach · Rocco Sedona · Andreas Lintermann 🔗
-	Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale ( Poster ) > link Link	Gavia Gray · Anshul Samar · Joel Hestness 🔗
-	Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators ( Poster ) > link SlidesLive Video Link	Yaniv Blumenfeld · Itay Hubara · Daniel Soudry 🔗
-	ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation ( Poster ) > link SlidesLive Video Link	Divyang Doshi · Jung-Eun Kim 🔗
-	Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search ( Poster ) > link Link	Lei Chen 🔗
-	Remaining-Useful-Life Prediction and Uncertainty Quantification using LSTM Ensembles for Aircraft Engines ( Poster ) >	Oishi Deb · Emmanouil Benetos · Philip Torr 🔗
-	LightSeq: : Sequence Level Parallelism for Distributed Training of Long Context Transformers ( Poster ) > link Link	Dacheng Li · Rulin Shao · Anze Xie · Eric Xing · Joseph Gonzalez · Ion Stoica · Xuezhe Ma · Hao Zhang 🔗
-	FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments ( Poster ) > link SlidesLive Video Link	Mert Unsal · Ali Maatouk · Antonio De Domenico · Nicola Piovesan · Fadhel Ayed 🔗
-	DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers ( Poster ) > link SlidesLive Video Link	Sarin Chandy · Varun Prashant Gangal · Yi Yang · Gabriel Maggiotti 🔗
-	Improving Deep Ensembles without Communication ( Poster ) > link Link	Konstantinos Pitas · Michael Arbel · Julyan Arbel 🔗
-	ConcatPlexer : Additional Dim1 Batching for Faster ViTs ( Contributed Talk & Poster ) > link SlidesLive Video Link	Donghoon Han · Seunghyeon Seo · Donghyeon Jeon · Jiho Jang · Chaerin Kong · Nojun Kwak 🔗
-	InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning ( Poster ) > link Link	Sharath Nittur Sridhar · Souvik Kundu · Sairam Sundaresan · Maciej Szankin · Anthony Sarah 🔗
-	ReLoRA: High-Rank Training Through Low-Rank Updates ( Poster ) > link Link	Vladislav Lialin · Sherin Muckatira · Namrata Shivagunde · Anna Rumshisky 🔗
-	Sparse Iso-FLOP Transformations for Maximizing Training Efficiency ( Poster ) > link Link	Vithursan Thangarasa · Shreyas Saxena · Abhay Gupta · Sean Lie 🔗
-	Embarrassingly Simple Dataset Distillation ( Poster ) > link Link	Yunzhen Feng · Shanmukha Ramakrishna Vedantam · Julia Kempe 🔗
-	Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs ( Poster ) > link Link	Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao 🔗
-	A Quadratic Synchronization Rule for Distributed Deep Learning ( Poster ) > link Link	Xinran Gu · Kaifeng Lyu · Sanjeev Arora · Jingzhao Zhang · Longbo Huang 🔗
-	DAREL: Data Reduction with Losses for Training Acceleration of Real and Hypercomplex Neural Networks ( Poster ) > link SlidesLive Video Link	Alexander Demidovskij · Aleksei Trutnev · Artyom Tugaryov · Igor Salnikov · Stanislav Pavlov 🔗
-	Accelerating Deep Learning using Ivy ( Poster ) > link SlidesLive Video Link	Guillermo Sanchez-Brizuela · Ved Patwardhan · Matthew Barrett · Paul Anderson · Mustafa Hani · Daniel Lenton 🔗
-	Something for (almost) nothing: improving deep ensemble calibration using unlabeled data ( Poster ) > link SlidesLive Video Link	Konstantinos Pitas · Julyan Arbel 🔗
-	LeanFlex-GKP: Advancing Hassle-Free Structured Pruning with Simple Flexible Group Count ( Poster ) > link Link	Jiamu Zhang · Shaochen (Henry) Zhong · Andrew Ye · Zirui Liu · Kaixiong Zhou · Xia Hu · Shuai Xu · Vipin Chaudhary 🔗
-	Patch Gradient Descent: Training Neural Networks on Very Large Images ( Poster ) > link SlidesLive Video Link	Deepak Gupta · Gowreesh Mago · Arnav Chavan · Dilip K. Prasad · Rajat Thomas 🔗
-	Batched Low-Rank Adaptation of Foundation Models ( Poster ) > link Link	Yeming Wen · Swarat Chaudhuri 🔗
-	Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models ( Poster ) > link Link	Oscar Key · Jean Kaddour · Pasquale Minervini 🔗
-	Early Weight Averaging meets High Learning Rates for LLM Pre-training ( Poster ) > link Link	Sunny Sanyal · Atula Neerkaje · Jean Kaddour · Abhishek Kumar · Sujay Sanghavi 🔗
-	Bandit-Driven Batch Selection for Robust Learning under Label Noise ( Poster ) > link SlidesLive Video Link	Michal Lisicki · Graham Taylor · Mihai Nica 🔗
-	Maestro: Uncovering Low-Rank Structures via Trainable Decomposition ( Poster ) > link Link	Samuel Horváth · Stefanos Laskaridis · Shashank Rajput · Hongyi Wang 🔗
-	Tiny Graph Convolutional Networks with Topologically Consistent Magnitude Pruning ( Poster ) > link Link	Hichem SAHBI 🔗
-	DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency ( Poster ) > link Link	azhar shaikh · Michael Cochez · Denis Diachkov · Michiel de Rijcke · Sahar Yousefi 🔗
-	Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning ( Poster ) > link Link	Mengzhou Xia · Tianyu Gao · Zhiyuan Zeng · Danqi Chen 🔗
-	A foundation for exact binarized morphological neural networks ( Poster ) > link Link	Theodore Aouad · Hugues Talbot 🔗
-	Training Bayesian Neural Networks with Sparse Subspace Variational Inference ( Poster ) > link Link	Junbo Li · Zichen Miao · Qiang Qiu · Ruqi Zhang 🔗
-	Task Arithmetic with LoRA for Continual Learning ( Poster ) > link Link	Rajas Chitale · Ankit Vaidya · Aditya Kane · Archana Ghotkar 🔗
-	Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning ( Poster ) > link Link	Colin Bellinger · Mark Crowley · Isaac Tamblyn 🔗
-	Cooperative Learning for Cost-Adaptive Inference ( Poster ) > link Link	Xingli Fang · Richard Bradford · Jung-Eun Kim 🔗
-	Generalisable Agents for Neural Network Optimisation ( Poster ) > link Link	Kale-ab Tessera · Callum R. Tilbury · Sasha Abramowitz · Ruan John de Kock · Omayma Mahjoub · Benjamin Rosman · Sara Hooker · Arnu Pretorius 🔗