Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Workshop
Mon Dec 13 03:15 AM -- 02:35 PM (PST)
OPT 2021: Optimization for Machine Learning
Courtney Paquette · Quanquan Gu · Oliver Hinder · Katya Scheinberg · Sebastian Stich · Martin Takac





Workshop Home Page

OPT 2021 will bring experts in optimization to share their perspectives while leveraging crossover experts in ML to share their views and recent advances. OPT 2021 honors this tradition of bringing together people from optimization and from ML in order to promote and generate new interactions between the two communities.

To foster the spirit of innovation and collaboration, a goal of this workshop, OPT 2021 will focus the contributed talks on research in “Beyond Worst-case Complexity”. Classical optimization analyses measure the performances of algorithms based on (1). the computation cost and (2). convergence for any input into the algorithm. Yet algorithms with worse traditional complexity (e.g. SGD and its variants, ADAM, etc), are increasingly popular in practice for training deep neural networks and other ML tasks. This leads to questions such as what are good modeling assumptions for ML problems to measure an optimization algorithm’s success and how can we leverage these to better understand the performances of known (and new) algorithms. For instance, typical optimization problems in ML may be better conditioned than their worst-case counterparts in part because the problems are highly structured and/or high-dimensional (large number of features/samples). One could leverage this observation to design algorithms with better “average-case” complexity. Moreover, increasing research seems to indicate an intimate connection between the optimization algorithm and how well it performs on the test data (generalization). This new area of research in ML and its deep ties to optimization warrants a necessary discussion between the two communities. Specifically, we aim to continue the discussion on the precise meaning of generalization and average-case complexity and to formalize what this means for optimization algorithms. By bringing together experts in both fields, OPT 2021 will foster insightful discussions around these topics and more.

Welcome event (gather.town) (Social event/Break)
Opening Remarks to Session 1 (Organizer intro)
Deep Learning: Success, Failure, and the Border between them, Shai Shalev-Shwartz (Plenary Speaker)
Q&A with Shai Shalev-Shwartz (Q&A)
Learning with Strange Gradients, Martin Jaggi (Plenary Speaker)
Q&A with Martin Jaggi (Q&A)
Contributed Talks in Session 1 (Zoom) (Orals and spotlights)
Poster Session 1 (gather.town) (Poster session)
Break (gather.town) (Break)
Opening Remarks to Session 2 (Organizer intro)
The global optimization of functions with low effective dimension - better than worst-case?, Coralia Cartis (Plenary Speaker)
Q&A with Coralia Cartis (Q&A)
Non-Euclidean Differentially Private Stochastic Convex Optimization, Cristóbal Guzmán (Plenary Speaker)
Q&A with Cristóbal Guzmán (Q&A)
Contributed Talks in Session 2 (Zoom) (Orals and spotlights)
Break
Opening Remarks to Session 3 (Organizer intro)
Avoiding saddle points in nonsmooth optimization, Damek Davis (Plenary Speaker)
Q&A with Damek Davis (Q&A)
Faster Empirical Risk Minimization, Jelena Diakonikolas (Plenary Speaker)
Q&A with Jelena Diakonikolas (Q&A)
Contributed talks in Session 3 (Zoom) (Orals and spotlights)
Poster Session 2 (gather.town) (Poster session)
Break (gather.town) (Break)
Opening Remarks to Session 4 (Organizer intro)
Online Learning via Linear Programming, Yinyu Ye (Plenary Speaker)
Q&A with Yinyu Ye (Q&A)
Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney (Plenary Speaker)
Q&A with Michael Mahoney (Q&A)
Contributed talks in Session 4 (Zoom) (Orals and spotlights)
Closing remarks (Organizer closing)
Shifted Compression Framework: Generalizations and Improvements (Poster)
The Geometric Occam Razor Implicit in Deep Learning (Poster)
Faking Interpolation Until You Make It (Poster)
High Probability Step Size Lower Bound for Adaptive Stochastic Optimization (Poster)
Adaptive Optimization with Examplewise Gradients (Poster)
ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method (Poster)
EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback (Poster)
Stochastic Polyak Stepsize with a Moving Target (Poster)
A Stochastic Momentum Method for Min-max Bilevel Optimization (Poster)
Faster Perturbed Stochastic Gradient Methods for Finding Local Minima (Poster)
Adam vs. SGD: Closing the generalization gap on image classification (Poster)
Simulated Annealing for Neural Architecture Search (Poster)
Faster Quasi-Newton Methods for Linear Composition Problems (Poster)
On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging (Poster)
On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging (Oral)
On the Relation between Distributionally Robust Optimization and Data Curation (Poster)
On the Relation between Distributionally Robust Optimization and Data Curation (Oral)
Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence (Poster)
Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence (Oral)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation (Oral)
Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes (Poster)
Better Linear Rates for SGD with Data Shuffling (Spotlight)
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization (Poster)
Acceleration and Stability of the Stochastic Proximal Point Algorithm (Spotlight)
Community-based Layerwise Distributed Training of Graph Convolutional Networks (Poster)
Optimum-statistical Collaboration Towards Efficient Black-boxOptimization (Poster)
Sign-RIP: A Robust Restricted Isometry Property for Low-rank Matrix Recovery (Poster)
Random-reshuffled SARAH does not need a full gradient computations (Poster)
Escaping Local Minima With Stochastic Noise (Poster)
Structured Low-Rank Tensor Learning (Poster)
Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent (Poster)
Deep Neural Networks pruning via the Structured Perspective Regularization (Poster)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation (Poster)
Fast, Exact Subsampled Natural Gradients and First-Order KFAC (Spotlight)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization (Spotlight)
Better Linear Rates for SGD with Data Shuffling (Poster)
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization (Spotlight)
Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization (Poster)
Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations (Poster)
A New Scheme for Boosting with an Average Margin Distribution Oracle (Poster)
Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes (Spotlight)
Spherical Perspective on Learning with Normalization Layers (Poster)
On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics (Poster)
Integer Programming Approaches To Subspace Clustering With Missing Data (Poster)
Integer Programming Approaches To Subspace Clustering With Missing Data (Spotlight)
Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks (Poster)
Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks (Spotlight)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds (Poster)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds (Spotlight)
Spherical Perspective on Learning with Normalization Layers (Spotlight)
Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective (Poster)
Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective (Spotlight)
Fast, Exact Subsampled Natural Gradients and First-Order KFAC (Poster)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization (Poster)
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers (Poster)
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers (Spotlight)
Acceleration and Stability of the Stochastic Proximal Point Algorithm (Poster)
Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian Processes (Poster)
DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning (Poster)
Barzilai and Borwein conjugate gradient method equipped with a non-monotone line search technique (Poster)
Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training (Poster)
COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization (Poster)
Stochastic Learning Equation using Monotone Increasing Resolution of Quantization (Poster)
Practice-Consistent Analysis of Adam-Style Methods (Poster)
Towards Robust and Automatic Hyper-Parameter Tunning (Poster)