`

( events)   Timezone: »  
Workshop
Mon Dec 13 03:15 AM -- 02:35 PM (PST)
OPT 2021: Optimization for Machine Learning
Courtney Paquette · Quanquan Gu · Oliver Hinder · Katya Scheinberg · Sebastian Stich · Martin Takac





OPT 2021 will bring experts in optimization to share their perspectives while leveraging crossover experts in ML to share their views and recent advances. OPT 2021 honors this tradition of bringing together people from optimization and from ML in order to promote and generate new interactions between the two communities.

To foster the spirit of innovation and collaboration, a goal of this workshop, OPT 2021 will focus the contributed talks on research in “Beyond Worst-case Complexity”. Classical optimization analyses measure the performances of algorithms based on (1). the computation cost and (2). convergence for any input into the algorithm. Yet algorithms with worse traditional complexity (e.g. SGD and its variants, ADAM, etc), are increasingly popular in practice for training deep neural networks and other ML tasks. This leads to questions such as what are good modeling assumptions for ML problems to measure an optimization algorithm’s success and how can we leverage these to better understand the performances of known (and new) algorithms. For instance, typical optimization problems in ML may be better conditioned than their worst-case counterparts in part because the problems are highly structured and/or high-dimensional (large number of features/samples). One could leverage this observation to design algorithms with better “average-case” complexity. Moreover, increasing research seems to indicate an intimate connection between the optimization algorithm and how well it performs on the test data (generalization). This new area of research in ML and its deep ties to optimization warrants a necessary discussion between the two communities. Specifically, we aim to continue the discussion on the precise meaning of generalization and average-case complexity and to formalize what this means for optimization algorithms. By bringing together experts in both fields, OPT 2021 will foster insightful discussions around these topics and more.

Welcome event (gather.town) (Social event/Break)
Opening Remarks to Session 1 (Organizer intro)
Deep Learning: Success, Failure, and the Border between them, Shai Shalev-Shwartz (Plenary Speaker)
Q&A with Shai Shalev-Shwartz (Q&A)
Learning with Strange Gradients, Martin Jaggi (Plenary Speaker)
Q&A with Martin Jaggi (Q&A)
Contributed Talks in Session 1 (Zoom) (Orals and spotlights)
Poster Session 1 (gather.town) (Poster session)
Break (gather.town) (Break)
Opening Remarks to Session 2 (Organizer intro)
The global optimization of functions with low effective dimension - better than worst-case?, Coralia Cartis (Plenary Speaker)
Q&A with Coralia Cartis (Q&A)
Non-Euclidean Differentially Private Stochastic Convex Optimization, Cristóbal Guzmán (Plenary Speaker)
Q&A with Cristóbal Guzmán (Q&A)
Contributed Talks in Session 2 (Zoom) (Orals and spotlights)
Break
Opening Remarks to Session 3 (Organizer intro)
Avoiding saddle points in nonsmooth optimization, Damek Davis (Plenary Speaker)
Q&A with Damek Davis (Q&A)
Faster Empirical Risk Minimization, Jelena Diakonikolas (Plenary Speaker)
Q&A with Jelena Diakonikolas (Q&A)
Contributed talks in Session 3 (Zoom) (Orals and spotlights)
Poster Session 2 (gather.town) (Poster session)
Break (gather.town) (Break)
Opening Remarks to Session 4 (Organizer intro)
Online Learning via Linear Programming, Yinyu Ye (Plenary Speaker)
Q&A with Yinyu Ye (Q&A)
Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney (Plenary Speaker)
Q&A with Michael Mahoney (Q&A)
Contributed talks in Session 4 (Zoom) (Orals and spotlights)
Closing remarks (Organizer closing)
Random-reshuffled SARAH does not need a full gradient computations (Poster)
EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback (Poster)
Faking Interpolation Until You Make It (Poster)
On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics (Poster)
Acceleration and Stability of the Stochastic Proximal Point Algorithm (Spotlight)
ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method (Poster)
Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective (Poster)
Fast, Exact Subsampled Natural Gradients and First-Order KFAC (Poster)
Community-based Layerwise Distributed Training of Graph Convolutional Networks (Poster)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds (Spotlight)
Fast, Exact Subsampled Natural Gradients and First-Order KFAC (Spotlight)
Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes (Poster)
The Geometric Occam Razor Implicit in Deep Learning (Poster)
COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization (Poster)
Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent (Poster)
On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging (Poster)
Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization (Poster)
Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations (Poster)
DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning (Poster)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation (Poster)
Acceleration and Stability of the Stochastic Proximal Point Algorithm (Poster)
Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks (Poster)
Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks (Spotlight)
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization (Poster)
On the Relation between Distributionally Robust Optimization and Data Curation (Oral)
Deep Neural Networks pruning via the Structured Perspective Regularization (Poster)
On the Relation between Distributionally Robust Optimization and Data Curation (Poster)
Integer Programming Approaches To Subspace Clustering With Missing Data (Poster)
Optimum-statistical Collaboration Towards Efficient Black-boxOptimization (Poster)
A New Scheme for Boosting with an Average Margin Distribution Oracle (Poster)
Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes (Spotlight)
Stochastic Polyak Stepsize with a Moving Target (Poster)
On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging (Oral)
A Stochastic Momentum Method for Min-max Bilevel Optimization (Poster)
Faster Quasi-Newton Methods for Linear Composition Problems (Poster)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds (Poster)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization (Poster)
Spherical Perspective on Learning with Normalization Layers (Poster)
Structured Low-Rank Tensor Learning (Poster)
Faster Perturbed Stochastic Gradient Methods for Finding Local Minima (Poster)
Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective (Spotlight)
Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence (Poster)
Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian Processes (Poster)
Stochastic Learning Equation using Monotone Increasing Resolution of Quantization (Poster)
Escaping Local Minima With Stochastic Noise (Poster)
Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training (Poster)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization (Spotlight)
Simulated Annealing for Neural Architecture Search (Poster)
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers (Poster)
Barzilai and Borwein conjugate gradient method equipped with a non-monotone line search technique (Poster)
Spherical Perspective on Learning with Normalization Layers (Spotlight)
Adam vs. SGD: Closing the generalization gap on image classification (Poster)
High Probability Step Size Lower Bound for Adaptive Stochastic Optimization (Poster)
Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence (Oral)
Better Linear Rates for SGD with Data Shuffling (Spotlight)
Shifted Compression Framework: Generalizations and Improvements (Poster)
Towards Robust and Automatic Hyper-Parameter Tunning (Poster)
Integer Programming Approaches To Subspace Clustering With Missing Data (Spotlight)
Better Linear Rates for SGD with Data Shuffling (Poster)
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization (Spotlight)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation (Oral)
Practice-Consistent Analysis of Adam-Style Methods (Poster)
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers (Spotlight)
Sign-RIP: A Robust Restricted Isometry Property for Low-rank Matrix Recovery (Poster)
Adaptive Optimization with Examplewise Gradients (Poster)