OPT 2023: Optimization for Machine Learning

Workshop

OPT 2023: Optimization for Machine Learning

Cristóbal Guzmán · Courtney Paquette · Katya Scheinberg · Aaron Sidford · Sebastian Stich

Fri 15 Dec, 7 a.m. PST

[ Abstract ] Workshop Website

Optimization lies at the heart of many machine learning algorithms and enjoys great interest in our community. Indeed, this intimate relation of optimization with ML is the key motivation for the OPT series of workshops. We aim to foster discussion, discovery, and dissemination of state-of-the-art research in optimization relevant to ML.

To foster the spirit of innovation and collaboration, a goal of this workshop, OPT 2023 will focus the contributed talks on research in "Optimization in the Wild"; this title is meant to encompass the new challenges that traditional optimization theory and algorithms face with the growth and variety of novel ML applications.

Successful applications of both theory and algorithms from optimization to ML frequently require a profound redesign or even entirely new approaches. This becomes apparent in settings where the classical (empirical) risk minimization approach is no longer sufficient to address the challenges of learning. As motivating examples, we consider the case of learning under (group or individual) fairness in distributed scenarios, learning under differential privacy, robustness, multi-task and transfer learning, as well as sampling from log-concave distributions. On the other hand, novel neural network architectures (such as transformers) require exploiting its structures for efficient optimization in crucial ways. For these models and problems: What is the role of optimization? What synergies can be exploited with the insights coming from these particular areas towards more efficient and reliable solutions? We will foster discussions directed at developing understanding of these challenges, and raising awareness of the capabilities and risks of using optimization in each of these areas.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 7:00 a.m. - 7:01 a.m.	Opening Remarks ( Opening remarks ) > SlidesLive Video	Cristóbal Guzmán 🔗
Fri 7:00 a.m. - 7:30 a.m.	DoG is SGD’s best friend: toward tuning-free stochastic optimization, Yair Carmon ( Plenary speaker ) > SlidesLive Video	Yair Carmon 🔗
Fri 7:30 a.m. - 8:00 a.m.	Contributed Talks 1: Escaping mediocrity: how two-layer networks learn hard generalized linear models and Last Iterate Convergence of Popov Method for Non-monotone Stochastic Variational Inequalities ( Contributed talks ) > SlidesLive Video	Bruno Loureiro · Daniil Vankov · Courtney Paquette 🔗
Fri 8:00 a.m. - 9:00 a.m.	Poster Session 1	42 presenters Egor Shulgin · Mingzhen He · Hanmin Li · Thibault Lahire · Eric Zelikman · Damien Scieur · Rajat Vadiraj Dwaraknath · Gene Li · Zhanhong Jiang · Rahul Jain · Zihan Zhou · Tianyue Zhang · Ilyas Fatkhullin · Frederik Kunstner · Utkarsh Singhal · Bruno Loureiro · Krishna C Kalagarla · Kai Liu · Michal Derezinski · Ross Clarke · Dimitri Papadimitriou · Mo Zhou · Jörg Franke · Chandler Smith · Darshan Chakrabarti · Trang H. Tran · Mokhwa Lee · Wei Kuang · Vincent Roulet · John Lazarsfeld · Donghyun Oh · Yihe Deng · Fu Wang · Junchi YANG · Dániel Rácz · Jeffrey Flanigan · Aaron Mishkin · Luca Scharr · Robert Gower · Chaoyue Liu · Yushen Huang · Nicholas Recker 🔗
Fri 9:00 a.m. - 9:30 a.m.	Contributed Talks 2: An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization and Practical Principled Policy Optimization for Finite MDPs ( Contributed talks ) > SlidesLive Video	Guy Kornowski · Michael Lu · Aaron Sidford 🔗
Fri 9:30 a.m. - 10:00 a.m.	Aiming towards the minimizers: fast convergence of SGD for overparameterized problems, Dmitriy Drusvyatskiy ( Plenary speaker ) > SlidesLive Video	Dmitriy Drusvyatskiy 🔗
Fri 10:00 a.m. - 12:00 p.m.	Lunch	🔗
Fri 12:00 p.m. - 12:30 p.m.	Evaluating Large-Scale Learning Systems, Virginia Smith ( Plenary speaker ) > SlidesLive Video	Virginia Smith 🔗
Fri 12:30 p.m. - 1:00 p.m.	Contributed Talks 3: Dueling Optimization with a Monotone Adversary and High-Dimensional Prediction for Sequential Decision Making ( Contributed talks ) > SlidesLive Video	Naren Manoj · Georgy Noarov · Cristóbal Guzmán 🔗
Fri 1:00 p.m. - 2:00 p.m.	Poster Session 2	43 presenters Xiao-Yang Liu · Guy Kornowski · Philipp Dahlinger · Abbas Ehsanfar · Binyamin Perets · David Martinez-Rubio · Sudeep Raja Putta · Runlong Zhou · Connor Lawless · Julian J Stier · Chen Fan · Michal Šustr · James Spann · Jung Hun Oh · Yao Xie · Qi Zhang · Krishna Acharya · Sourabh Medapati · Sharan Vaswani · Sruthi Gorantla · Mohamed Elsayed · Hongyang Zhang · Reza Asad · Viktor Pavlovic · Betty Shea · Georgy Noarov · Chuan He · Daniil Vankov · Taoan Huang · Michael Lu · Anant Mathur · Konstantin Mishchenko · Stanley Wei · Francesco Faccio · Yuchen Zeng · Tianyue Zhang · Chris Junchi Li · Aaron Mishkin · Sina Baharlouei · Chen Xu · Sasha Abramowitz · Sebastian Stich · Felix Dangel 🔗
Fri 2:00 p.m. - 2:30 p.m.	Sharply predicting the behavior of complex iterative algorithms with random data, Ashwin Pananjady ( Plenary speaker ) > SlidesLive Video	Ashwin Pananjady 🔗
Fri 2:30 p.m. - 3:00 p.m.	Provable Feature Learning in Gradient Descent, Jason Lee ( Plenary speaker ) > SlidesLive Video	Jason Lee 🔗
Fri 3:00 p.m. - 3:01 p.m.	Closing Remarks ( Closing ) >	Cristóbal Guzmán 🔗
-	Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization ( Poster ) > link Link	Hanmin Li · Avetik Karagulyan · Peter Richtarik 🔗
-	Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded Geometric Penalties ( Poster ) > link Link	David Martinez-Rubio · Christophe Roux · Christopher Criscitiello · Sebastian Pokutta 🔗
-	Risk Bounds of Accelerated SGD for Overparameterized Linear Regression ( Poster ) > link Link	Xuheng Li · Yihe Deng · Jingfeng Wu · Dongruo Zhou · Quanquan Gu 🔗
-	Follow the flow: Proximal flow inspired multi-step methods ( Poster ) > link Link	Yushen Huang · Yifan Sun 🔗
-	A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning ( Poster ) > link Link	Haoxiang Wang · Zhanhong Jiang · Chao Liu · Soumik Sarkar · Dongxiang Jiang · Young Lee 🔗
-	Last Iterate Convergence of Popov Method for Non-monotone Stochastic Variational Inequalities ( Oral ) > link Link	Daniil Vankov · Angelia Nedich · Lalitha Sankar 🔗
-	Generalisable Agents for Neural Network Optimisation ( Poster ) > link Link	Kale-ab Tessera · Callum R. Tilbury · Sasha Abramowitz · Ruan John de Kock · Omayma Mahjoub · Benjamin Rosman · Sara Hooker · Arnu Pretorius 🔗
-	Accelerated gradient descent: A guaranteed bound for a heuristic restart strategy ( Poster ) > link Link	Walaa Moursi · Stephen Vavasis · Viktor Pavlovic 🔗
-	Adagrad Promotes Diffuse Solutions In Overparameterized Regimes ( Poster ) > link Link	Andrew Rambidis · Jiayi Wang 🔗
-	Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs ( Poster ) > link Link	Zihan Zhou · Honghao Wei · Lei Ying 🔗
-	Reducing Predict and Optimize to Convex Feasibility ( Poster ) > link Link	Saurabh Mishra · Sharan Vaswani 🔗
-	Diversity-adjusted adaptive step size ( Poster ) > link Link	Parham Yazdkhasti · Xiaowen Jiang · Sebastian Stich 🔗
-	Global CFR: Meta-Learning in Self-Play Regret Minimization ( Poster ) > link Link	David Sychrovský · Michal Sustr · Michael Bowling · Martin Schmid 🔗
-	Noise Injection Irons Out Local Minima and Saddle Points ( Poster ) > link Link	Konstantin Mishchenko · Sebastian Stich 🔗
-	How to Guess a Gradient ( Poster ) > link Link	Utkarsh Singhal · Brian Cheung · Kartik Chandra · Jonathan Ragan-Kelley · Josh Tenenbaum · Tomaso Poggio · Stella X. Yu 🔗
-	Stochastic FISTA Step Search Algorithm for Convex Optimization ( Poster ) > link Link	Trang H. Tran · Lam Nguyen · Katya Scheinberg 🔗
-	K-Spin Ising Model for Combinatorial Optimizations over Graphs: An Reinforcement Learning Approach ( Poster ) > link Link	Xiao-Yang Liu · Ming Zhu 🔗
-	Parameter-Agnostic Optimization under Relaxed Smoothness ( Poster ) > link Link	Florian Hübler · Junchi YANG · Xiang Li · Niao He 🔗
-	Escaping mediocrity: how two-layer networks learn hard generalized linear models ( Oral ) > link Link	Luca Arnaboldi · Florent Krzakala · Bruno Loureiro · Ludovic Stephan 🔗
-	The Expressive Power of Low-Rank Adaptation ( Poster ) > link Link	Yuchen Zeng · Kangwook Lee 🔗
-	FaDE: Fast DARTS Estimator on Hierarchical NAS Spaces ( Poster ) > link Link	Simon Neumeyer · Julian J Stier · Michael Granitzer 🔗
-	Nesterov Meets Robust Multitask Learning Twice ( Poster ) > link Link	Yifan Kang · Kai Liu 🔗
-	On the Interplay Between Stepsize Tuning and Progressive Sharpening ( Poster ) > link Link	Vincent Roulet · Atish Agarwala · Fabian Pedregosa 🔗
-	Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem ( Poster ) > link Link	Robin Yadav · Frederik Kunstner · Mark Schmidt · Alberto Bietti 🔗
-	Level Set Teleportation: the Good, the Bad, and the Ugly ( Poster ) > link Link	Aaron Mishkin · Alberto Bietti · Robert Gower 🔗
-	An alternative approach to train neural networks using monotone variational inequality ( Poster ) > link Link	Chen Xu · Xiuyuan Cheng · Yao Xie 🔗
-	Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation ( Poster ) > link Link	Krishna C Kalagarla · Rahul Jain · Pierluigi Nuzzo 🔗
-	Average-Constrained Policy Optimization ( Poster ) > link Link	Akhil Agnihotri · Rahul Jain · Haipeng Luo 🔗
-	A novel analysis of gradient descent under directional smoothness ( Poster ) > link Link	Aaron Mishkin · Ahmed Khaled · Aaron Defazio · Robert Gower 🔗
-	The Sharp Power Law of Local Search on Expanders ( Poster ) > link Link	Nicholas Recker · Simina Branzei · Davin Choo 🔗
-	Regret Bounds for Optimistic Follow The Leader: Applications in Portfolio Selection and Linear Regression ( Poster ) > link Link	Sudeep Raja Putta · Shipra Agrawal 🔗
-	Bandit-Driven Batch Selection for Robust Learning under Label Noise ( Poster ) > link Link	Michal Lisicki · Mihai Nica · Graham Taylor 🔗
-	Practical Principled Policy Optimization for Finite MDPs ( Oral ) > link Link	Michael Lu · Matin Aghaei · Anant Raj · Sharan Vaswani 🔗
-	Adaptive Gradient Methods at the Edge of Stability ( Poster ) > link Link	Jeremy M Cohen · Behrooz Ghorbani · Shankar Krishnan · Naman Agarwal · Sourabh Medapati · Michal Badura · Daniel Suo · Zachary Nado · George Dahl · Justin Gilmer 🔗
-	Non-Uniform Sampling and Adaptive Optimizers in Deep Learning ( Poster ) > link Link	Thibault Lahire 🔗
-	Large-scale Non-convex Stochastic Constrained Distributionally Robust Optimization ( Poster ) > link Link	Qi Zhang · Shaofeng Zou · Yi Zhou · Lixin Shen · Ashley Prater-Bennette 🔗
-	Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization ( Poster ) > link Link	Philipp Dahlinger · Philipp Becker · Maximilian Hüttenrauch · Gerhard Neumann 🔗
-	Decentralized Learning Dynamics in the Gossip Model ( Poster ) > link Link	John Lazarsfeld · Dan Alistarh 🔗
-	Almost multisecant BFGS quasi-Newton method ( Poster ) > link Link	Mokhwa Lee · Yifan Sun 🔗
-	From 6235149080811616882909238708 to 29: Vanilla Thompson Sampling Revisited ( Poster ) > link Link	Bingshan Hu · Tianyue Zhang 🔗
-	Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning ( Poster ) > link Link	Mohamed Elsayed · Rupam Mahmood 🔗
-	Revisiting Random Weight Perturbation for Efficiently Improving Generalization ( Poster ) > link Link	Tao Li · Weihao weihao · Qinghua Tao · Zehao Lei · Yingwen Wu · Kun Fang · Mingzhen He · Xiaolin Huang 🔗
-	MSL: An Adaptive Momentem-based Stochastic Line-search Framework ( Poster ) > link Link	Chen Fan · Sharan Vaswani · Christos Thrampoulidis · Mark Schmidt 🔗
-	Noise Stability Optimization for Flat Minima with Tight Rates ( Poster ) > link Link	Haotian Ju · Dongyue Li · Hongyang Zhang 🔗
-	Dueling Optimization with a Monotone Adversary ( Oral ) > link Link	Avrim Blum · Meghal Gupta · Gene Li · Naren Manoj · Aadirupa Saha · Yuanyuan Yang 🔗
-	Noise-adaptive (Accelerated) Stochastic Heavy-Ball Momentum ( Poster ) > link Link	Anh Dang · Reza Babanezhad Harikandeh · Sharan Vaswani 🔗
-	Unnormalized Density Estimation with Root Sobolev Norm Regularization ( Poster ) > link Link	Mark Kozdoba · Binyamin Perets · Shie Mannor 🔗
-	Accelerating Inexact HyperGradient Descent for Bilevel Optimization ( Poster ) > link Link	Yang Haikuo · Luo Luo · Chris Junchi Li · Michael Jordan · Maryam Fazel 🔗
-	High Dimensional Unbiased Estimation for Sequential Decision Making ( Oral ) > link Link	Georgy Noarov · Ramya Ramalingam · Aaron Roth · Stephan Xie 🔗
-	Efficient Learning in Polyhedral Games via Best Response Oracles ( Poster ) > link Link	Darshan Chakrabarti · Gabriele Farina · Christian Kroer 🔗
-	On the Convergence of Local SGD Under Third-Order Smoothness and Hessian Similarity ( Poster ) > link Link	Ali Zindari · Ruichen Luo · Sebastian Stich 🔗
-	Adam through a Second-Order Lens ( Poster ) > link Link	Ross Clarke · Baiyu Su · José Miguel Hernández-Lobato 🔗
-	How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization ( Poster ) > link Link	Nuoya Xiong · Lijun Ding · Simon Du 🔗
-	Exploring Modern Evolution Strategies in Portfolio Optimization ( Poster ) > link Link	Ramin Hasani · Abbas Ehsanfar · Greg Banis · Rusty Bealer · Amir Ahmadi 🔗
-	Greedy Newton: Newton's Method with Exact Line Search ( Poster ) > link Link	Betty Shea · Mark Schmidt 🔗
-	A proximal augmented Lagrangian based algorithm for federated learning with constraints ( Poster ) > link Link	Chuan He · Le Peng · Ju Sun 🔗
-	Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets ( Poster ) > link Link	Wu Lin · Felix Dangel · Runa Eschenhagen · Kirill Neklyudov · Agustinus Kristiadi · Richard Turner · Alireza Makhzani 🔗
-	Statistical Inference of Adaptive Inexact Stochastic Newton Method ( Poster ) > link Link	Wei Kuang · Sen Na · Mihai Anitescu 🔗
-	$f$ -FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization ( Poster ) > link Link	Sina Baharlouei · Shivam Patel · Meisam Razaviyayn 🔗
-	Oracle Efficient Algorithms for Groupwise Regret ( Poster ) > link Link	Krishna Acharya · Eshwar Ram Arunachaleswaran · Juba Ziani · Aaron Roth · Sampath Kannan 🔗
-	(Un)certainty selection methods for Active Learning on Label Distributions ( Poster ) > link Link	James Spann · Christopher Homan 🔗
-	SGD batch saturation for training wide neural networks ( Poster ) > link Link	Chaoyue Liu · Dmitriy Drusvyatskiy · Misha Belkin · Damek Davis · Yian Ma 🔗
-	Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches ( Poster ) > link Link	Michal Derezinski 🔗
-	Enhancing the Misreport Network for Optimal Auction Design ( Poster ) > link Link	Haiying Wu · shuyuan you · Zhiqiang Zhuang · Kewen Wang · Zhe Wang 🔗
-	Towards a Better Theoretical Understanding of Independent Subnetwork Training ( Poster ) > link Link	Egor Shulgin · Peter Richtarik 🔗
-	Adaptive Quasi-Newton and Anderson Acceleration Framework with Explicit Global (Accelerated) Convergence Rates ( Poster ) > link Link	Damien Scieur 🔗
-	Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm ( Poster ) > link Link	Peiyuan Zhang · Jingzhao Zhang · Suvrit Sra 🔗
-	Cup Curriculum: Curriculum Learning on Model Capacity ( Poster ) > link Link	Luca Scharr · Vanessa Toborek 🔗
-	An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization ( Oral ) > link Link	Guy Kornowski · Ohad Shamir 🔗
-	Fair Minimum Representation Clustering ( Poster ) > link Link	Connor Lawless · Oktay Gunluk 🔗
-	Fair Representation in Submodular Subset Selection: A Pareto Optimization Approach ( Poster ) > link Link	Adriano Fazzone · Yanhao Wang · Francesco Bonchi 🔗
-	New Horizons in Parameter Regularization: A Constraint Approach ( Poster ) > link Link	Jörg Franke · Michael Hefenbrock · Gregor Koehler · Frank Hutter 🔗
-	Continually Adapting Optimizers Improve Meta-Generalization ( Poster ) > link Link	Wenyi Wang · Louis Kirsch · Francesco Faccio · Mingchen Zhuge · Jürgen Schmidhuber 🔗
-	Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism ( Poster ) > link Link	Reza Asad · Reza Babanezhad Harikandeh · Issam Hadj Laradji · Nicolas Le Roux · Sharan Vaswani 🔗
-	On the Parallel Complexity of Multilevel Monte Carlo in Stocahstic Gradient Descent ( Poster ) > link Link	Kei Ishikawa 🔗
-	Pruning Neural Networks with Velocity-Constrained Optimization ( Poster ) > link Link	Donghyun Oh · Jinseok Chung · Namhoon Lee 🔗
-	Feature Selection in Generalized Linear models via the Lasso: To Scale or Not to Scale? ( Poster ) > link Link	Anant Mathur · Sarat Moka 🔗
-	DIRECT Optimisation with Bayesian Insights: Assessing Reliability Under Fixed Computational Budgets ( Poster ) > link Link	Fu Wang · Zeyu Fu · Xiaowei Huang · Wenjie Ruan 🔗
-	Understanding the Role of Optimization in Double Descent ( Poster ) > link Link	Chris Liu · Jeffrey Flanigan 🔗
-	Variance Reduced Model Based Methods: New rates and adaptive step sizes ( Poster ) > link Link	Robert Gower · Frederik Kunstner · Mark Schmidt 🔗
-	On the convergence of warped proximal iterations for solving nonmonotone inclusions and applications ( Poster ) > link Link	Dimitri Papadimitriou · Bang Cong Vu 🔗
-	On the Synergy Between Label Noise and Learning Rate Annealing in Neural Network Training ( Poster ) > link Link	Stanley Wei · Tongzheng Ren · Simon Du 🔗
-	Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness ( Poster ) > link Link	Sruthi Gorantla · Eshaan Bhansali · Amit Deshpande · Anand Louis 🔗
-	Contrastive Predict-and-Search for Mixed Integer Linear Programs ( Poster ) > link Link	Taoan Huang · Aaron Ferber · Arman Zharmagambetov · Yuandong Tian · Bistra Dilkina 🔗
-	Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle ( Poster ) > link Link	Dániel Rácz · Mihaly Petreczky · Balint Daroczy 🔗
-	Riemannian Optimization for Euclidean Distance Geometry ( Poster ) > link Link	Chandler Smith · Samuel Lichtenberg · HanQin Cai · Abiy Tasissa 🔗
-	GUC: Unsupervised non-parametric Global Clustering and Anomaly Detection ( Poster ) > link Link	Chris Solomou 🔗
-	Testing Approximate Stationarity Concepts for Piecewise Smooth Functions ( Poster ) > link Link	Lai Tian · Anthony Man-Cho So 🔗
-	Multi-head CLIP: Improving CLIP with Diverse Representations and Flat Minima ( Poster ) > link Link	Mo Zhou · Xiong Zhou · Erran Li Li · Stefano Ermon · Rong Ge 🔗
-	DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep Networks ( Poster ) > link Link	Mrinal Mathur · Sergey Plis 🔗
-	Optimal Transport for Kernel Gaussian Mixture Models ( Poster ) > link Link	Jung Hun Oh · Rena Elkin · Anish Simhal · Jiening Zhu · Joseph Deasy · Allen Tannenbaum 🔗
-	Stochastic Optimization under Hidden Convexity ( Poster ) > link Link	Ilyas Fatkhullin · Niao He · Yifan Hu 🔗
-	On Optimization Formulations of Finite Horizon MDPs ( Poster ) > link Link	Rajat Vadiraj Dwaraknath · Lexing Ying 🔗
-	Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation ( Poster ) > link Link	Eric Zelikman · Eliana Lorch · Lester Mackey · Adam Tauman Kalai 🔗
-	Learning Multi-Objective Optimization Problem Through Online Learning ( Poster ) > link Link	Chaosheng Dong · Yijia Wang · Bo Zeng 🔗