Timezone: »
Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of {\em implicit gradient transport} (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.
Author Information
Sébastien Arnold (University of Southern California)
Pierre-Antoine Manzagol (Google)
Reza Babanezhad Harikandeh (UBC)
Ioannis Mitliagkas (Mila & University of Montreal)
Nicolas Le Roux (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Wed. Dec 11th 12:30 -- 12:35 AM Room West Exhibition Hall A
More from the Same Authors
-
2021 Spotlight: Uniform Sampling over Episode Difficulty »
Sébastien Arnold · Guneet Dhillon · Avinash Ravichandran · Stefano Soatto -
2021 Spotlight: PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair »
Zimin Chen · Vincent J Hellendoorn · Pascal Lamblin · Petros Maniatis · Pierre-Antoine Manzagol · Daniel Tarlow · Subhodeep Moitra -
2022 : Poly-S: Analyzing and Improving Polytropon for Data-Efficient Multi-Task Learning »
Lucas Page-Caccia · Edoardo Maria Ponti · Liyuan Liu · Matheus Pereira · Nicolas Le Roux · Alessandro Sordoni -
2022 : Neural Networks Efficiently Learn Low-Dimensional Representations with SGD »
Alireza Mousavi-Hosseini · Sejun Park · Manuela Girotti · Ioannis Mitliagkas · Murat Erdogdu -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2022 : Performative Prediction with Neural Networks »
Mehrnaz Mofakhami · Ioannis Mitliagkas · Gauthier Gidel -
2022 : Empirical Study on Optimizer Selection for Out-of-Distribution Generalization »
Hiroki Naganuma · Kartik Ahuja · Ioannis Mitliagkas · Shiro Takagi · Tetsuya Motokawa · Rio Yokota · Kohta Ishikawa · Ikuro Sato -
2022 : A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods »
Tiago Salvador · Kilian FATRAS · Ioannis Mitliagkas · Adam Oberman -
2022 : A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · J. Zico Kolter · Nicolas Loizou · Marc Lanctot · Ioannis Mitliagkas · Noam Brown · Christian Kroer -
2023 Competition: NeurIPS 2023 Machine Unlearning Competition »
Eleni Triantafillou · Fabian Pedregosa · Meghdad Kurmanji · Kairan ZHAO · Gintare Karolina Dziugaite · Peter Triantafillou · Ioannis Mitliagkas · Vincent Dumoulin · Lisheng Sun · Peter Kairouz · Julio C Jacques Junior · Jun Wan · Sergio Escalera · Isabelle Guyon -
2022 Poster: Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound »
Charles Guille-Escuret · Adam Ibrahim · Baptiste Goujaud · Ioannis Mitliagkas -
2021 Poster: Uniform Sampling over Episode Difficulty »
Sébastien Arnold · Guneet Dhillon · Avinash Ravichandran · Stefano Soatto -
2021 Poster: PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair »
Zimin Chen · Vincent J Hellendoorn · Pascal Lamblin · Petros Maniatis · Pierre-Antoine Manzagol · Daniel Tarlow · Subhodeep Moitra -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux -
2020 Poster: An operator view of policy gradient methods »
Dibya Ghosh · Marlos C. Machado · Nicolas Le Roux -
2019 Workshop: Bridging Game Theory and Deep Learning »
Ioannis Mitliagkas · Gauthier Gidel · Niao He · Reyhane Askari Hemmat · N H · Nika Haghtalab · Simon Lacoste-Julien -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2018 : Poster Session 1 (note there are numerous missing names here, all papers appear in all poster sessions) »
Akhilesh Gotmare · Kenneth Holstein · Jan Brabec · Michal Uricar · Kaleigh Clary · Cynthia Rudin · Sam Witty · Andrew Ross · Shayne O'Brien · Babak Esmaeili · Jessica Forde · Massimo Caccia · Ali Emami · Scott Jordan · Bronwyn Woods · D. Sculley · Rebekah Overdorf · Nicolas Le Roux · Peter Henderson · Brandon Yang · Tzu-Yu Liu · David Jensen · Niccolo Dalmasso · Weitang Liu · Paul Marc TRICHELAIR · Jun Ki Lee · Akanksha Atrey · Matt Groh · Yotam Hechtlinger · Emma Tosch -
2015 Poster: StopWasting My Gradients: Practical SVRG »
Reza Babanezhad Harikandeh · Mohamed Osama Ahmed · Alim Virani · Mark Schmidt · Jakub Konečný · Scott Sallinen -
2012 Poster: A latent factor model for highly multi-relational data »
Rodolphe Jenatton · Nicolas Le Roux · Antoine Bordes · Guillaume R Obozinski -
2012 Poster: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2012 Oral: A Stochastic Gradient Method with an Exponential Convergence
Rate for Finite Training Sets »
Nicolas Le Roux · Mark Schmidt · Francis Bach -
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng -
2011 Poster: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2011 Oral: Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization »
Mark Schmidt · Nicolas Le Roux · Francis Bach -
2007 Poster: Learning the 2-D Topology of Images »
Nicolas Le Roux · Yoshua Bengio · Pascal Lamblin · Marc Joliveau · Balázs Kégl -
2007 Poster: Topmoumoute Online Natural Gradient Algorithm »
Nicolas Le Roux · Pierre-Antoine Manzagol · Yoshua Bengio