Order up! The Benefits of Higher-Order Optimization in Machine Learning

Workshop

Order up! The Benefits of Higher-Order Optimization in Machine Learning

Albert Berahas · Jelena Diakonikolas · Jarad Forristal · Brandon Reese · Martin Takac · Yan Xu

Fri 2 Dec, 6:15 a.m. PST

[ Abstract ] Workshop Website

[ Contact: hoo.ml.workshop@gmail.com ]

Optimization is a cornerstone of nearly all modern machine learning (ML) and deep learning (DL). Simple first-order gradient-based methods dominate the field for convincing reasons: low computational cost, simplicity of implementation, and strong empirical results.

Yet second- or higher-order methods are rarely used in DL, despite also having many strengths: faster per-iteration convergence, frequent explicit regularization on step-size, and better parallelization than SGD. Additionally, many scientific fields use second-order optimization with great success.

A driving factor for this is the large difference in development effort. By the time higher-order methods were tractable for DL, first-order methods such as SGD and it’s main varients (SGD + Momentum, Adam, …) already had many years of maturity and mass adoption.

The purpose of this workshop is to address this gap, to create an environment where higher-order methods are fairly considered and compared against one-another, and to foster healthy discussion with the end goal of mainstream acceptance of higher-order methods in ML and DL.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 6:15 a.m. - 6:30 a.m.	Welcome and Opening Remarks ( Opening Remarks ) > SlidesLive Video	🔗
Fri 6:30 a.m. - 7:15 a.m.	Efficient Second-Order Stochastic Methods for Machine Learning ( Plenary Talk ) > SlidesLive Video	Donald Goldfarb 🔗
Fri 7:15 a.m. - 8:00 a.m.	Tensor Methods for Nonconvex Optimization. ( Plenary Talk ) > SlidesLive Video	Coralia Cartis 🔗
Fri 8:00 a.m. - 8:30 a.m.	Coffee Break ( Coffee Break ) >	🔗
Fri 8:00 a.m. - 8:30 a.m.	Poster Session I ( Poster Session ) >	🔗
Fri 8:30 a.m. - 8:45 a.m.	Quartic Polynomial Sub-problem Solutions in Tensor Methods for Nonconvex Optimization ( Spotlight Talk ) > SlidesLive Video	Wenqi Zhu 🔗
Fri 8:45 a.m. - 9:00 a.m.	PSPS: Preconditioned Stochastic Polyak Step-size method for badly scaled data ( Spotlight Talk ) > SlidesLive Video	Farshed Abdukhakimov 🔗
Fri 9:00 a.m. - 9:15 a.m.	DRSOM: A Dimension Reduced Second-Order Method ( Spotlight Talk ) > SlidesLive Video	Chuwen Zhang 🔗
Fri 9:15 a.m. - 9:30 a.m.	Disentangling the Mechanisms Behind Implicit Regularization in SGD ( Spotlight Talk ) > SlidesLive Video	Zachary Novack 🔗
Fri 9:30 a.m. - 9:45 a.m.	Cubic Regularized Quasi-Newton Methods ( Spotlight Talk ) > SlidesLive Video	Klea Ziu 🔗
Fri 9:45 a.m. - 10:00 a.m.	Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization ( Spotlight Talk ) > SlidesLive Video	Kabir Chandrasekher 🔗
Fri 10:00 a.m. - 11:30 a.m.	Lunch Break ( Lunch Break ) >	🔗
Fri 11:30 a.m. - 12:15 p.m.	Deterministically Constrained Stochastic Optimization ( Plenary Talk ) > SlidesLive Video	Frank E. Curtis 🔗
Fri 12:15 p.m. - 1:00 p.m.	Low Rank Approximation for Faster Convex Optimization ( Plenary Talk ) > SlidesLive Video	Madeleine Udell 🔗
Fri 1:00 p.m. - 1:30 p.m.	Coffee Break ( Coffee Break ) >	🔗
Fri 1:00 p.m. - 2:00 p.m.	Poster Session II ( Poster Session ) >	🔗
Fri 2:00 p.m. - 2:45 p.m.	A Fast, Fisher Based Pruning of Transformers without Retraining ( Plenary Talk ) > SlidesLive Video	Amir Gholami 🔗
Fri 2:45 p.m. - 3:00 p.m.	Closing Remarks ( Closing Remarks ) > SlidesLive Video	🔗
-	How Small Amount of Data Sharing Benefits Higher-Order Distributed Optimization and Learning ( Poster ) >	Mingxi Zhu · Yinyu Ye 🔗
-	A Stochastic Conjugate Subgradient Algorithm for Kernelized Support Vector Machines: The Evidence ( Poster ) >	Di Zhang · Suvrajeet Sen 🔗
-	Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation ( Poster ) >	Rustem Islamov · Xun Qian · Slavomír Hanzely · Mher Safaryan · Peter Richtarik 🔗
-	Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems ( Poster ) >	Yuchen Fang · Sen Na · Mladen Kolar 🔗
-	Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models: First-Order Stationarity ( Poster ) >	Yuchen Fang · Sen Na · Mladen Kolar 🔗
-	Black Box Lie Group Preconditioners for SGD ( Poster ) >	Xilin Li 🔗
-	Perseus: A Simple and Optimal High-Order Method for Variational Inequalities ( Poster ) >	Tianyi Lin · Michael Jordan 🔗
-	High-Order Optimization of Gradient Boosted Decision Trees ( Poster ) >	Jean Pachebat · Sergey IVANOV 🔗
-	On the Global Convergence of the Regularized Generalized Gauss-Newton Algorithm ( Poster ) >	Vincent Roulet · Maryam Fazel · Siddhartha Srinivasa · Zaid Harchaoui 🔗
-	Disentangling the Mechanisms Behind Implicit Regularization in SGD ( Poster ) >	Zachary Novack · Simran Kaur · Tanya Marwah · Saurabh Garg · Zachary Lipton 🔗
-	Effects of momentum scaling for SGD ( Poster ) >	Dmitry A. Pasechnyuk · Alexander Gasnikov · Martin Takac 🔗
-	Using quadratic equations for overparametrized models ( Poster ) >	Shuang Li · William Swartworth · Martin Takac · Deanna Needell · Robert Gower 🔗
-	Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization ( Poster ) >	MENGQI LOU · Kabir Chandrasekher · Ashwin Pananjady 🔗
-	Improving Levenberg-Marquardt Algorithm for Neural Networks ( Poster ) >	Omead Pooladzandi · Yiming Zhou 🔗
-	DRSOM: A Dimension Reduced Second-Order Method ( Poster ) >	Chuwen Zhang · Jiang Bo · Chang He · Yuntian Jiang · Dongdong Ge · Yinyu Ye 🔗
-	Random-subspace adaptive cubic regularisation method for nonconvex optimisation ( Poster ) >	Coralia Cartis · Zhen Shao 🔗
-	Quartic Polynomial Sub-problem Solutions in Tensor Methods for Nonconvex Optimization ( Poster ) >	Wenqi Zhu · Coralia Cartis 🔗
-	FLECS-CGD: A Federated Learning Second-Order Framework via Compression and Sketching with Compressed Gradient Differences ( Poster ) >	Artem Agafonov · Brahim Erraji · Martin Takac 🔗
-	Statistical and Computational Complexities of BFGS Quasi-Newton Method for Generalized Linear Models ( Poster ) >	Qiujiang Jin · Aryan Mokhtari · Nhat Ho · Tongzheng Ren 🔗
-	The Trade-offs of Incremental Linearization Algorithms for Nonsmooth Composite Problems ( Poster ) >	Krishna Pillutla · Vincent Roulet · Sham Kakade · Zaid Harchaoui 🔗
-	Cubic Regularized Quasi-Newton Methods ( Poster ) >	Dmitry Kamzolov · Klea Ziu · Artem Agafonov · Martin Takac 🔗
-	ASDL: A Unified Interface for Gradient Preconditioning in PyTorch ( Poster ) >	Kazuki Osawa · Satoki Ishikawa · Rio Yokota · Shigang Li · Torsten Hoefler 🔗
-	PSPS: Preconditioned Stochastic Polyak Step-size method for badly scaled data ( Poster ) >	Farshed Abdukhakimov · Chulu Xiang · Dmitry Kamzolov · Robert Gower · Martin Takac 🔗
-	HesScale: Scalable Computation of Hessian Diagonals ( Poster ) >	Mohamed Elsayed · Rupam Mahmood 🔗