`

( events)   Timezone: »  
Workshop
Mon Dec 13 03:15 AM -- 02:35 PM (PST)
OPT 2021: Optimization for Machine Learning
Courtney Paquette · Courtney Paquette · Quanquan Gu · Oliver Hinder · Katya Scheinberg · Sebastian Stich · Martin Takac





Workshop Home Page

OPT 2021 will bring experts in optimization to share their perspectives while leveraging crossover experts in ML to share their views and recent advances. OPT 2021 honors this tradition of bringing together people from optimization and from ML in order to promote and generate new interactions between the two communities.

To foster the spirit of innovation and collaboration, a goal of this workshop, OPT 2021 will focus the contributed talks on research in “Beyond Worst-case Complexity”. Classical optimization analyses measure the performances of algorithms based on (1). the computation cost and (2). convergence for any input into the algorithm. Yet algorithms with worse traditional complexity (e.g. SGD and its variants, ADAM, etc), are increasingly popular in practice for training deep neural networks and other ML tasks. This leads to questions such as what are good modeling assumptions for ML problems to measure an optimization algorithm’s success and how can we leverage these to better understand the performances of known (and new) algorithms. For instance, typical optimization problems in ML may be better conditioned than their worst-case counterparts in part because the problems are highly structured and/or high-dimensional (large number of features/samples). One could leverage this observation to design algorithms with better “average-case” complexity. Moreover, increasing research seems to indicate an intimate connection between the optimization algorithm and how well it performs on the test data (generalization). This new area of research in ML and its deep ties to optimization warrants a necessary discussion between the two communities. Specifically, we aim to continue the discussion on the precise meaning of generalization and average-case complexity and to formalize what this means for optimization algorithms. By bringing together experts in both fields, OPT 2021 will foster insightful discussions around these topics and more.

Welcome event (gather.town) (Social event/Break)
Opening Remarks to Session 1 (Organizer intro)
Deep Learning: Success, Failure, and the Border between them, Shai Shalev-Shwartz (Plenary Speaker)
Q&A with Shai Shalev-Shwartz (Q&A)
Learning with Strange Gradients, Martin Jaggi (Plenary Speaker)
Q&A with Martin Jaggi (Q&A)
Contributed Talks in Session 1 (Zoom) (Orals and spotlights)
Poster Session 1 (gather.town) (Poster session)
Break (gather.town) (Break)
Opening Remarks to Session 2 (Organizer intro)
The global optimization of functions with low effective dimension - better than worst-case?, Coralia Cartis (Plenary Speaker)
Q&A with Coralia Cartis (Q&A)
Non-Euclidean Differentially Private Stochastic Convex Optimization, Cristóbal Guzmán (Plenary Speaker)
Q&A with Cristóbal Guzmán (Q&A)
Contributed Talks in Session 2 (Zoom) (Orals and spotlights)
Break
Opening Remarks to Session 3 (Organizer intro)
Avoiding saddle points in nonsmooth optimization, Damek Davis (Plenary Speaker)
Q&A with Damek Davis (Q&A)
Faster Empirical Risk Minimization, Jelena Diakonikolas (Plenary Speaker)
Q&A with Jelena Diakonikolas (Q&A)
Contributed talks in Session 3 (Zoom) (Orals and spotlights)
Poster Session 2 (gather.town) (Poster session)
Break (gather.town) (Break)
Opening Remarks to Session 4 (Organizer intro)
Online Learning via Linear Programming, Yinyu Ye (Plenary Speaker)
Q&A with Yinyu Ye (Q&A)
Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney (Plenary Speaker)
Q&A with Michael Mahoney (Q&A)
Contributed talks in Session 4 (Zoom) (Orals and spotlights)
Closing remarks (Organizer closing)
Better Linear Rates for SGD with Data Shuffling (Spotlight)
COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization (Poster)
Practice-Consistent Analysis of Adam-Style Methods (Poster)
Towards Robust and Automatic Hyper-Parameter Tunning (Poster)
Structured Low-Rank Tensor Learning (Poster)
Integer Programming Approaches To Subspace Clustering With Missing Data (Spotlight)
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization (Spotlight)
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers (Spotlight)
Fast, Exact Subsampled Natural Gradients and First-Order KFAC (Spotlight)
Sign-RIP: A Robust Restricted Isometry Property for Low-rank Matrix Recovery (Poster)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds (Poster)
Better Linear Rates for SGD with Data Shuffling (Poster)
Acceleration and Stability of Stochastic Proximal Point Algorithm (Spotlight)
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization (Poster)
Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective (Poster)
Fast, Exact Subsampled Natural Gradients and First-Order KFAC (Poster)
Community-based Layerwise Distributed Training of Graph Convolutional Networks (Poster)
Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence (Poster)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation (Oral)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds (Spotlight)
Optimum-statistical Collaboration Towards Efficient Black-boxOptimization (Poster)
Escaping Local Minima With Stochastic Noise (Poster)
Faking Interpolation Until You Make It (Poster)
Stochastic Polyak Stepsize with a Moving Target (Poster)
Stochastic Learning Equation using Monotone Increasing Resolution of Quantization (Poster)
Integer Programming Approaches To Subspace Clustering With Missing Data (Poster)
Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes (Spotlight)
Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian Processes (Poster)
A New Scheme for Boosting with an Average Margin Distribution Oracle (Poster)
EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback (Poster)
Adaptive Optimization with Examplewise Gradients (Poster)
On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging (Oral)
DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning (Poster)
A Stochastic Momentum Method for Min-max Bilevel Optimization (Poster)
ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method (Poster)
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers (Poster)
On the Relation between Distributionally Robust Optimization and Data Curation (Poster)
Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks (Poster)
Spherical Perspective on Learning with Normalization Layers (Poster)
Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks (Spotlight)
Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent (Poster)
Simulated Annealing for Neural Architecture Search (Poster)
Deep Neural Networks pruning via the Structured Perspective Regularization (Poster)
Spherical Perspective on Learning with Normalization Layers (Spotlight)
Faster Quasi-Newton Methods for Linear Composition Problems (Poster)
On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging (Poster)
Adam vs. SGD: Closing the generalization gap on image classification (Poster)
High Probability Step Size Lower Bound for Adaptive Stochastic Optimization (Poster)
On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics (Poster)
Faster Perturbed Stochastic Gradient Methods for Finding Local Minima (Poster)
Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization (Poster)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization (Spotlight)
Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective (Spotlight)
Barzilai and Borwein conjugate gradient method equipped with a non-monotone line search technique (Poster)
Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes (Poster)
The Geometric Occam Razor Implicit in Deep Learning (Poster)
Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations (Poster)
Random-reshuffled SARAH does not need a full gradient computations (Poster)
Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training (Poster)
Acceleration and Stability of Stochastic Proximal Point Algorithm (Poster)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization (Poster)
Shifted Compression Framework: Generalizations and Improvements (Poster)
On the Relation between Distributionally Robust Optimization and Data Curation (Oral)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation (Poster)
Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence (Oral)