Workshop

OPT 2021: Optimization for Machine Learning

Courtney Paquette · Quanquan Gu · Oliver Hinder · Katya Scheinberg · Sebastian Stich · Martin Takac

Project Page

Abstract

OPT 2021 will bring experts in optimization to share their perspectives while leveraging crossover experts in ML to share their views and recent advances. OPT 2021 honors this tradition of bringing together people from optimization and from ML in order to promote and generate new interactions between the two communities.

To foster the spirit of innovation and collaboration, a goal of this workshop, OPT 2021 will focus the contributed talks on research in “Beyond Worst-case Complexity”. Classical optimization analyses measure the performances of algorithms based on (1). the computation cost and (2). convergence for any input into the algorithm. Yet algorithms with worse traditional complexity (e.g. SGD and its variants, ADAM, etc), are increasingly popular in practice for training deep neural networks and other ML tasks. This leads to questions such as what are good modeling assumptions for ML problems to measure an optimization algorithm’s success and how can we leverage these to better understand the performances of known (and new) algorithms. For instance, typical optimization problems in ML may be better conditioned than their worst-case counterparts in part because the problems are highly structured and/or high-dimensional (large number of features/samples). One could leverage this observation to design algorithms with better “average-case” complexity. Moreover, increasing research seems to indicate an intimate connection between the optimization algorithm and how well it performs on the test data (generalization). This new area of research in ML and its deep ties to optimization warrants a necessary discussion between the two communities. Specifically, we aim to continue the discussion on the precise meaning of generalization and average-case complexity and to formalize what this means for optimization algorithms. By bringing together experts in both fields, OPT 2021 will foster insightful discussions around these topics and more.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

3:15 AM

Welcome event (gather.town) Social event/Break

Link

3:58 AM

Opening Remarks to Session 1 Organizer intro

Sebastian Stich

Video

4:00 AM

Deep Learning: Success, Failure, and the Border between them, Shai Shalev-Shwartz Plenary Speaker

Shai Shalev-Shwartz

Video

4:25 AM

Q&A with Shai Shalev-Shwartz Q&A

Shai Shalev-Shwartz

4:30 AM

Learning with Strange Gradients, Martin Jaggi Plenary Speaker

Martin Jaggi

Video

4:55 AM

Q&A with Martin Jaggi Q&A

Martin Jaggi

5:00 AM

Contributed Talks in Session 1 (Zoom) Orals and spotlights

Sebastian Stich · Futong Liu · Abdurakhmon Sadiev · Frederik Benzing · Simon Roburin

Video

5:30 AM

Poster Session 1 (gather.town) Poster session

Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani

Link

6:30 AM

Break (gather.town)

Link

6:58 AM

Opening Remarks to Session 2 Organizer intro

Courtney Paquette

7:00 AM

The global optimization of functions with low effective dimension - better than worst-case?, Coralia Cartis Plenary Speaker

Coralia Cartis

Video

7:25 AM

Q&A with Coralia Cartis Q&A

Coralia Cartis

7:30 AM

Non-Euclidean Differentially Private Stochastic Convex Optimization, Cristóbal Guzmán Plenary Speaker

Cristóbal Guzmán

Video

7:55 AM

Q&A with Cristóbal Guzmán Q&A

Cristóbal Guzmán

8:00 AM

Contributed Talks in Session 2 (Zoom) Orals and spotlights

Courtney Paquette · Chris Junchi Li · Jeffery Kline · Junhyung Lyle Kim · Pascal Esser

Video

8:30 AM

Break

Link

9:58 AM

Opening Remarks to Session 3 Organizer intro

Oliver Hinder

10:00 AM

Avoiding saddle points in nonsmooth optimization, Damek Davis Plenary Speaker

Damek Davis

Video

10:25 AM

Q&A with Damek Davis Q&A

Damek Davis

10:30 AM

Faster Empirical Risk Minimization, Jelena Diakonikolas Plenary Speaker

Jelena Diakonikolas

Video

10:55 AM

Q&A with Jelena Diakonikolas Q&A

Jelena Diakonikolas

11:00 AM

Contributed talks in Session 3 (Zoom) Orals and spotlights

Oliver Hinder · Wenhao Zhan · Akhilesh Soni · Grigory Malinovsky · Boyue Li

Video

11:30 AM

Poster Session 2 (gather.town) Poster session

Wenjie Li · Akhilesh Soni · Jinwuk Seok · Jianhao Ma · Jeffery Kline · Mathieu Tuli · Miaolan Xie · Robert Gower · Quanqi Hu · Matteo Cacciola · Yuanlu Bai · Boyue Li · Wenhao Zhan · Shentong Mo · Junhyung Lyle Kim · Sajad Fathi Hafshejani · Chris Junchi Li · Zhishuai Guo · Harshvardhan Harshvardhan · Neha Wadia · Tatjana Chavdarova · Difan Zou · Zixiang Chen · Aman Gupta · Jacques Chen · Betty Shea · Benoit Dherin · Aleksandr Beznosikov

Link

12:30 PM

Break (gather.town)

Link

12:58 PM

Opening Remarks to Session 4 Organizer intro

Quanquan Gu

1:00 PM

Online Learning via Linear Programming, Yinyu Ye Plenary Speaker

Yinyu Ye

Video

1:25 PM

Q&A with Yinyu Ye Q&A

Yinyu Ye

1:30 PM

Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney Plenary Speaker

Michael Mahoney

Video

1:55 PM

Q&A with Michael Mahoney Q&A

Michael Mahoney

2:00 PM

Contributed talks in Session 4 (Zoom) Orals and spotlights

Quanquan Gu · Agnieszka Słowik · Jacques Chen · Neha Wadia · Difan Zou

Video

2:30 PM

Closing remarks Organizer closing

Courtney Paquette

Integer Programming Approaches To Subspace Clustering With Missing Data Poster

Akhilesh Soni · Daniel Pimentel-Alarcón

Integer Programming Approaches To Subspace Clustering With Missing Data Spotlight

Akhilesh Soni · Daniel Pimentel-Alarcón

Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks Poster

Jeffery Kline · Joseph Bockhorst

Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks Spotlight

Jeffery Kline · Joseph Bockhorst

Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes Poster

Abdurakhmon Sadiev · Ekaterina Borodich · Darina Dvinskikh · Aleksandr Beznosikov · Alexander Gasnikov

Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes Spotlight

Abdurakhmon Sadiev · Ekaterina Borodich · Darina Dvinskikh · Aleksandr Beznosikov · Alexander Gasnikov

Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds Poster

Pascal Esser · Frank Nielsen

Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds Spotlight

Pascal Esser · Frank Nielsen

Spherical Perspective on Learning with Normalization Layers Poster

Simon Roburin · Yann de Mont-Marin · Andrei Bursuc · Renaud Marlet · Patrick Pérez · Mathieu Aubry

Spherical Perspective on Learning with Normalization Layers Spotlight

Simon Roburin · Yann de Mont-Marin · Andrei Bursuc · Renaud Marlet · Patrick Pérez · Mathieu Aubry

Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective Poster

Neha Wadia · Michael Jordan · Michael Muehlebach

Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective Spotlight

Neha Wadia · Michael Jordan · Michael Muehlebach

Better Linear Rates for SGD with Data Shuffling Poster

Grigory Malinovsky · Alibek Sailanbayev · Peter Richtarik

Better Linear Rates for SGD with Data Shuffling Spotlight

Grigory Malinovsky · Alibek Sailanbayev · Peter Richtarik

Fast, Exact Subsampled Natural Gradients and First-Order KFAC Poster

Frederik Benzing

Fast, Exact Subsampled Natural Gradients and First-Order KFAC Spotlight

Frederik Benzing

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization Poster

Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization Spotlight

Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization Poster

Boyue Li · Zhize Li · Yuejie Chi

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization Spotlight

Boyue Li · Zhize Li · Yuejie Chi

Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers Poster

Jacques Chen · Frederik Kunstner · Mark Schmidt

Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers Spotlight

Jacques Chen · Frederik Kunstner · Mark Schmidt

Acceleration and Stability of the Stochastic Proximal Point Algorithm Poster

Junhyung Lyle Kim · Panos Toulis · Anastasios Kyrillidis

Acceleration and Stability of the Stochastic Proximal Point Algorithm Spotlight

Junhyung Lyle Kim · Panos Toulis · Anastasios Kyrillidis

Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian Processes Poster

Hamed Jalali · Gjergji Kasneci

DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning Poster

Robert Hönig · Yiren Zhao · Robert Mullins

Barzilai and Borwein conjugate gradient method equipped with a non-monotone line search technique Poster

Sajad Fathi Hafshejani · Daya Gaur · Shahadat Hossain · Robert Benkoczi

Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training Poster

Maximus Mutschler · Andreas Zell

Community-based Layerwise Distributed Training of Graph Convolutional Networks Poster

Hongyi Li · Junxiang Wang · Yongchao Wang · Yue Cheng · Liang Zhao

Optimum-statistical Collaboration Towards Efficient Black-boxOptimization Poster

Wenjie Li · Chi-Hua Wang · Guang Cheng

COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization Poster

Manuel Madeira · Renato Negrinho · Joao Xavier · Pedro Aguiar

Stochastic Learning Equation using Monotone Increasing Resolution of Quantization Poster

Jinwuk Seok ·

Sign-RIP: A Robust Restricted Isometry Property for Low-rank Matrix Recovery Poster

Jianhao Ma · Salar Fattahi

Practice-Consistent Analysis of Adam-Style Methods Poster

Zhishuai Guo · Yi Xu · Wotao Yin · Rong Jin · Tianbao Yang

Towards Robust and Automatic Hyper-Parameter Tunning Poster

Mathieu Tuli · Mahdi Hosseini · Konstantinos N Plataniotis

Random-reshuffled SARAH does not need a full gradient computations Poster

Aleksandr Beznosikov · Martin Takac

Shifted Compression Framework: Generalizations and Improvements Poster

Egor Shulgin · Peter Richtarik

A New Scheme for Boosting with an Average Margin Distribution Oracle Poster

Ryotaro Mitsuboshi · Kohei Hatano · Eiji Takimoto

The Geometric Occam Razor Implicit in Deep Learning Poster

Benoit Dherin · Michael Munn · David Barrett

Escaping Local Minima With Stochastic Noise Poster

Harshvardhan Harshvardhan · Sebastian Stich

Faking Interpolation Until You Make It Poster

Alasdair Paren · Rudra Poudel · Pawan K Mudigonda

High Probability Step Size Lower Bound for Adaptive Stochastic Optimization Poster

Katya Scheinberg · Miaolan Xie

Adaptive Optimization with Examplewise Gradients Poster

Julius Kunze · James Townsend · David Barber

Structured Low-Rank Tensor Learning Poster

Jayadev Naram · Tanmay Sinha · Pawan Kumar

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method Poster

Zhize Li

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback Poster

Peter Richtarik · Igor Sokolov · Ilyas Fatkhullin · Eduard Gorbunov · Zhize Li

Stochastic Polyak Stepsize with a Moving Target Poster

Robert Gower · Aaron Defazio · Mike Rabbat

Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations Poster

Tatjana Chavdarova · Michael Jordan · Emmanouil Zampetakis

Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent Poster

Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad Harikandeh

On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics Poster

Grigory Malinovsky · Konstantin Mishchenko · Peter Richtarik

A Stochastic Momentum Method for Min-max Bilevel Optimization Poster

Quanqi Hu · Bokun Wang · Tianbao Yang

Deep Neural Networks pruning via the Structured Perspective Regularization Poster

Matteo Cacciola · Andrea Lodi · Xinlin Li

Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization Poster

Yuanlu Bai · Svitlana Vyetrenko · Henry Lam · Tucker Balch

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima Poster

Zixiang Chen · Dongruo Zhou · Quanquan Gu

Adam vs. SGD: Closing the generalization gap on image classification Poster

Aman Gupta · Rohan Ramanath · Jun Shi · Sathiya Keerthi

Simulated Annealing for Neural Architecture Search Poster

Shentong Mo · Jingfei Xia · Pinxu Ren

Faster Quasi-Newton Methods for Linear Composition Problems Poster

Betty Shea · Mark Schmidt

On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging Poster

Chris Junchi Li · Yaodong Yu · Nicolas Loizou · Gauthier Gidel · Yi Ma · Nicolas Le Roux perso · Michael Jordan

On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging Oral

Chris Junchi Li · Yaodong Yu · Nicolas Loizou · Gauthier Gidel · Yi Ma · Nicolas Le Roux perso · Michael Jordan

On the Relation between Distributionally Robust Optimization and Data Curation Poster

Agnieszka Słowik · Leon Bottou

On the Relation between Distributionally Robust Optimization and Data Curation Oral

Agnieszka Słowik · Leon Bottou

Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence Poster

Wenhao Zhan · Shicong Cen · Baihe Huang · Yuxin Chen · Jason Lee · Yuejie Chi

Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence Oral

Wenhao Zhan · Shicong Cen · Baihe Huang · Yuxin Chen · Jason Lee · Yuejie Chi

Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation Poster

Futong Liu · Tao Lin · Martin Jaggi

Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation Oral

Futong Liu · Tao Lin · Martin Jaggi