Workshop

Optimization for ML Workshop

Jelena Diakonikolas · Dan Garber · Cristóbal Guzmán · Courtney Paquette · Sebastian Stich

Project Page [ OpenReview]

Abstract

Optimization lies at the heart of many machine learning algorithms and enjoys great interest in our community. Indeed, this intimate relation of optimization with ML is the key motivation for the OPT series of workshops. We aim to foster discussion, discovery, and dissemination of state-of-the-art research in optimization relevant to ML.

The focus of OPT 2024 is on "Scaling up optimization". The advent of large language models (LLMs) has changed our perceptions of the landscape of optimization and is resulting in the emergence of new interesting questions related to scaling. For instance, we can view optimization as a sequence of problems parameterized by the size of the model. Questions naturally arise around scaling and optimization. Are there natural model size dependent learning rates that allow extrapolation from smaller models to large ones, and therefore facilitating fine-tuning? Or given a fixed compute budget, how should one choose the hyper-parameters of the model (e.g., width size, depth size, architecture, batch) so as to minimize the loss function? How dependent are these scaling laws on the optimization algorithm? Answers to these questions would have a huge impact in AI – saving time and millions of dollars in training, plus helping reduce AI’s environmental impact through reducing energy costs. The new area of scaling laws and its deep ties to the optimization community warrants a necessary discussion.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

9:00 AM

Opening Remarks

Courtney Paquette

Video

9:01 AM

Optimizing Optimization Methods with Computer Assistance, Ben Grimmer

Benjamin Grimmer

Video

9:30 AM

Talk 1: *On the Inherent Privacy of Two Point Zeroth Order Projected Gradient Descent* and Talk 2: *The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization*

Devansh Gupta · Matan Schliserman

Video

10:00 AM

Poster Session 1

11:00 AM

Talk 1: *SOAP: Improving and Stabilizing Shampoo using Adam* and Talk 2: *μLO: Compute-Efficient Meta-Generalization of Learned Optimizers*

Depen Morwani · Benjamin Thérien

Video

11:30 AM

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning, Misha Belkin

Misha Belkin

Video

12:00 PM

Lunch

2:00 PM

Acceleration by Stepsize Hedging, Jason Altschuler

Jason Altschuler

Video

2:30 PM

Talk 1: *MindFlayer: Efficient Asynchronous Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times* and Talk 2: *Provable non-accelerations of the heavy-ball method*

Artavazd Maranjyan · Gauthier Gidel

Video

3:00 PM

Poster Session 2

4:00 PM

Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees, Aryan Mokhtari

Aryan Mokhtari

Video

4:30 PM

Future of OPT-ML, Discussion

Cristóbal Guzmán

Video

5:00 PM

Closing Remarks

Cristóbal Guzmán

Video

Remove Symmetries to Control Model Expressivity and Improve Optimization

Liu Ziyin · Yizhou Xu · Isaac Chuang