Timezone: »
Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains linear convergence for an important class of non-convex functions and saddle-point problems satisfying interpolation. To improve the proposed methods' practical performance, we give heuristics to use larger step-sizes and acceleration. We compare the proposed algorithms against numerous optimization methods on standard classification tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters. For multi-class classification using deep networks, SGD with Armijo line-search results in both faster convergence and better generalization.
Author Information
Sharan Vaswani (Mila, Université de Montréal)
Aaron Mishkin (University of British Columbia)
Issam Laradji (University of British Columbia)
Mark Schmidt (University of British Columbia)
Gauthier Gidel (Mila)
I am a Ph.D student supervised by Simon Lacoste-Julien, I graduated from ENS Ulm and Université Paris-Saclay. I was a visiting PhD student at Sierra. I also worked for 6 months as a freelance Data Scientist for Monsieur Drive (Acquired by Criteo) and I recently co-founded a startup called Krypto. I'm currently pursuing my PhD at Mila. My work focuses on optimization applied to machine learning. More details can be found in my resume. My research is to develop new optimization algorithms and understand the role of optimization in the learning procedure, in short, learn faster and better. I identify to the field of machine learning (NIPS, ICML, AISTATS and ICLR) and optimization (SIAM OP)
Simon Lacoste-Julien (Mila, Université de Montréal & SAIL Montreal)
Simon Lacoste-Julien is an associate professor at Mila and DIRO from Université de Montréal, and Canada CIFAR AI Chair holder. He also heads part time the SAIT AI Lab Montreal from Samsung. His research interests are machine learning and applied math, with applications in related fields like computer vision and natural language processing. He obtained a B.Sc. in math., physics and computer science from McGill, a PhD in computer science from UC Berkeley and a post-doc from the University of Cambridge. He spent a few years as a research faculty at INRIA and École normale supérieure in Paris before coming back to his roots in Montreal in 2016 to answer the call from Yoshua Bengio in growing the Montreal AI ecosystem.
More from the Same Authors
-
2021 Spotlight: A single gradient step finds adversarial examples on random two-layers neural networks »
Sebastien Bubeck · Yeshwanth Cherapanamjeri · Gauthier Gidel · Remi Tachet des Combes -
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent »
Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad Harikandeh -
2021 : Faster Quasi-Newton Methods for Linear Composition Problems »
Betty Shea · Mark Schmidt -
2021 : On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging »
Chris Junchi Li · Yaodong Yu · Nicolas Loizou · Gauthier Gidel · Yi Ma · Nicolas Le Roux perso · Michael Jordan -
2021 : On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging »
Chris Junchi Li · Yaodong Yu · Nicolas Loizou · Gauthier Gidel · Yi Ma · Nicolas Le Roux perso · Michael Jordan -
2021 : A Closer Look at Gradient Estimators with Reinforcement Learning as Inference »
Jonathan Lavington · Michael Teng · Mark Schmidt · Frank Wood -
2021 : An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control »
Nicholas Ioannidis · Jonathan Lavington · Mark Schmidt -
2022 : Nesterov Meets Optimism: Rate-Optimal Optimistic-Gradient-Based Method for Stochastic Bilinearly-Coupled Minimax Optimization »
Chris Junchi Li · Angela Yuan · Gauthier Gidel · Michael Jordan -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2022 : Momentum Extragradient is Optimal for Games with Cross-Shaped Spectrum »
Junhyung Lyle Kim · Gauthier Gidel · Anastasios Kyrillidis · Fabian Pedregosa -
2022 : Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint »
Amrutha Varshini Ramesh · Aaron Mishkin · Mark Schmidt -
2022 : Fast Convergence of Random Reshuffling under Interpolation and the Polyak-Łojasiewicz Condition »
Chen Fan · Christos Thrampoulidis · Mark Schmidt -
2022 : Unlocking Slot Attention by Changing Optimal Transport Costs »
Yan Zhang · David Zhang · Simon Lacoste-Julien · Gertjan Burghouts · Cees Snoek -
2022 : Performative Prediction with Neural Networks »
Mehrnaz Mofakhami · Ioannis Mitliagkas · Gauthier Gidel -
2022 : Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates »
Wu Lin · Valentin Duruisseaux · Melvin Leok · Frank Nielsen · Mohammad Emtiyaz Khan · Mark Schmidt -
2022 : Unlocking Slot Attention by Changing Optimal Transport Costs »
Yan Zhang · David Zhang · Simon Lacoste-Julien · Gertjan Burghouts · Cees Snoek -
2023 Poster: Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking »
Frederik Kunstner · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2023 Poster: Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models »
Leonardo Galli · Holger Rauhut · Mark Schmidt -
2023 Poster: Feature Likelihood Score: Evaluating the Generalization of Generative Models Using Samples »
Marco Jiralerspong · Joey Bose · Ian Gemp · Chongli Qin · Yoram Bachrach · Gauthier Gidel -
2023 Poster: BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization »
Chen Fan · Gaspard Choné-Ducasse · Mark Schmidt · Christos Thrampoulidis -
2023 Poster: Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure »
Angela Yuan · Chris Junchi Li · Gauthier Gidel · Michael Jordan · Quanquan Gu · Simon Du -
2023 Poster: Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation »
Sébastien Lachapelle · Divyat Mahajan · Ioannis Mitliagkas · Simon Lacoste-Julien -
2023 Poster: Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees »
Sharan Vaswani · Amirreza Kazemi · Reza Babanezhad Harikandeh · Nicolas Le Roux -
2023 Oral: Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation »
Sébastien Lachapelle · Divyat Mahajan · Ioannis Mitliagkas · Simon Lacoste-Julien -
2022 : Unlocking Slot Attention by Changing Optimal Transport Costs »
Yan Zhang · David Zhang · Simon Lacoste-Julien · Gertjan Burghouts · Cees Snoek -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints »
Jose Gallego-Posada · Juan Ramirez · Akram Erraqabi · Yoshua Bengio · Simon Lacoste-Julien -
2022 Poster: Data-Efficient Structured Pruning via Submodular Optimization »
Marwa El Halabi · Suraj Srinivas · Simon Lacoste-Julien -
2022 Poster: Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise »
Eduard Gorbunov · Marina Danilova · David Dobre · Pavel Dvurechenskii · Alexander Gasnikov · Gauthier Gidel -
2022 Poster: The Curse of Unrolling: Rate of Differentiating Through Optimization »
Damien Scieur · Gauthier Gidel · Quentin Bertrand · Fabian Pedregosa -
2022 Poster: Beyond L1: Faster and Better Sparse Models with skglm »
Quentin Bertrand · Quentin Klopfenstein · Pierre-Antoine Bannier · Gauthier Gidel · Mathurin Massias -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2022 Poster: Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution »
Antonio Orvieto · Simon Lacoste-Julien · Nicolas Loizou -
2022 Poster: Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities »
Eduard Gorbunov · Adrien Taylor · Gauthier Gidel -
2021 : Poster Session 1 (gather.town) »
Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani -
2021 Poster: Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity »
Nicolas Loizou · Hugo Berard · Gauthier Gidel · Ioannis Mitliagkas · Simon Lacoste-Julien -
2021 Poster: A single gradient step finds adversarial examples on random two-layers neural networks »
Sebastien Bubeck · Yeshwanth Cherapanamjeri · Gauthier Gidel · Remi Tachet des Combes -
2020 : Closing remarks »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 : Live Q&A with Michael Friedlander (Zoom) »
Mark Schmidt -
2020 : Intro to Invited Speaker 8 »
Mark Schmidt -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Live Q&A with Rachel Ward (Zoom) »
Mark Schmidt -
2020 : Live Q&A with Ashia Wilson (Zoom) »
Mark Schmidt -
2020 : Welcome remarks to Session 3 »
Mark Schmidt -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2020 : Contributed talks in Session 2 (Zoom) »
Martin Takac · Samuel Horváth · Guan-Horng Liu · Nicolas Loizou · Sharan Vaswani -
2020 : Contributed Video: Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search), Sharan Vaswani »
Sharan Vaswani -
2020 : Contributed Video: How to make your optimizer generalize better, Sharan Vaswani »
Sharan Vaswani -
2020 Workshop: OPT2020: Optimization for Machine Learning »
Courtney Paquette · Mark Schmidt · Sebastian Stich · Quanquan Gu · Martin Takac -
2020 : Welcome event (gather.town) »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 Poster: Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses »
Yihan Zhou · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2020 Poster: Adversarial Example Games »
Joey Bose · Gauthier Gidel · Hugo Berard · Andre Cianflone · Pascal Vincent · Simon Lacoste-Julien · Will Hamilton -
2020 Poster: Differentiable Causal Discovery from Interventional Data »
Philippe Brouillard · Sébastien Lachapelle · Alexandre Lacoste · Simon Lacoste-Julien · Alexandre Drouin -
2020 Spotlight: Differentiable Causal Discovery from Interventional Data »
Philippe Brouillard · Sébastien Lachapelle · Alexandre Lacoste · Simon Lacoste-Julien · Alexandre Drouin -
2020 Poster: Real World Games Look Like Spinning Tops »
Wojciech Czarnecki · Gauthier Gidel · Brendan Tracey · Karl Tuyls · Shayegan Omidshafiei · David Balduzzi · Max Jaderberg -
2019 Workshop: Bridging Game Theory and Deep Learning »
Ioannis Mitliagkas · Gauthier Gidel · Niao He · Reyhane Askari Hemmat · N H · Nika Haghtalab · Simon Lacoste-Julien -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Reducing Noise in GAN Training with Variance Reduced Extragradient »
Tatjana Chavdarova · Gauthier Gidel · François Fleuret · Simon Lacoste-Julien -
2019 Poster: Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks »
Gauthier Gidel · Francis Bach · Simon Lacoste-Julien -
2019 Poster: Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics »
Giancarlo Kerg · Kyle Goyette · Maximilian Puelma Touzel · Gauthier Gidel · Eugene Vorontsov · Yoshua Bengio · Guillaume Lajoie -
2018 : Opening remarks »
Simon Lacoste-Julien · Gauthier Gidel -
2018 Workshop: Smooth Games Optimization and Machine Learning »
Simon Lacoste-Julien · Ioannis Mitliagkas · Gauthier Gidel · Vasilis Syrgkanis · Eva Tardos · Leon Bottou · Sebastian Nowozin -
2018 Poster: SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient »
Aaron Mishkin · Frederik Kunstner · Didrik Nielsen · Mark Schmidt · Mohammad Emtiyaz Khan -
2018 Poster: Quantifying Learning Guarantees for Convex but Inconsistent Surrogates »
Kirill Struminsky · Simon Lacoste-Julien · Anton Osokin -
2017 : A3T: Adversarially Augmented Adversarial Training »
Aristide Baratin · Simon Lacoste-Julien · Yoshua Bengio · Akram Erraqabi -
2017 : On Structured Prediction Theory with Calibrated Convex Surrogate Losses. »
Simon Lacoste-Julien -
2017 Poster: Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization »
Fabian Pedregosa · Rémi Leblond · Simon Lacoste-Julien -
2017 Poster: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2017 Spotlight: Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization »
Fabian Pedregosa · Rémi Leblond · Simon Lacoste-Julien -
2017 Oral: On Structured Prediction Theory with Calibrated Convex Surrogate Losses »
Anton Osokin · Francis Bach · Simon Lacoste-Julien -
2016 : Fast Patch-based Style Transfer of Arbitrary Style »
Tian Qi Chen · Mark Schmidt -
2016 Poster: PAC-Bayesian Theory Meets Bayesian Inference »
Pascal Germain · Francis Bach · Alexandre Lacoste · Simon Lacoste-Julien -
2015 Poster: On the Global Linear Convergence of Frank-Wolfe Optimization Variants »
Simon Lacoste-Julien · Martin Jaggi -
2015 Poster: Barrier Frank-Wolfe for Marginal Inference »
Rahul G Krishnan · Simon Lacoste-Julien · David Sontag -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams -
2015 Poster: Rethinking LDA: Moment Matching for Discrete ICA »
Anastasia Podosinnikova · Francis Bach · Simon Lacoste-Julien -
2015 Poster: StopWasting My Gradients: Practical SVRG »
Reza Babanezhad Harikandeh · Mohamed Osama Ahmed · Alim Virani · Mark Schmidt · Jakub Konečný · Scott Sallinen -
2014 Poster: SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives »
Aaron Defazio · Francis Bach · Simon Lacoste-Julien -
2009 Workshop: The Generative and Discriminative Learning Interface »
Simon Lacoste-Julien · Percy Liang · Guillaume Bouchard -
2008 Poster: DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification »
Simon Lacoste-Julien · Fei Sha · Michael Jordan