Timezone: »
Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple over-parameterized problems, adaptive methods often find drastically different solutions than vanilla stochastic gradient descent (SGD). We construct an illustrative binary classification problem where the data is linearly separable, SGD achieves zero test error, and AdaGrad and Adam attain test errors arbitrarily close to 1/2. We additionally study the empirical generalization capability of adaptive methods on several state-of-the-art deep learning models. We observe that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance. These results suggest that practitioners should reconsider the use of adaptive methods to train neural networks.
Author Information
Ashia C Wilson (UC Berkeley)
Becca Roelofs (UC Berkeley)
Mitchell Stern (UC Berkeley)
Nati Srebro (TTI-Chicago)
Benjamin Recht (UC Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Thu. Dec 7th 02:30 -- 06:30 AM Room Pacific Ballroom #158
More from the Same Authors
-
2021 Spotlight: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 : Do ImageNet Classifiers Generalize to ImageNet? »
Benjamin Recht · Becca Roelofs · Ludwig Schmidt · Vaishaal Shankar -
2021 : Evaluating Machine Accuracy on ImageNet »
Vaishaal Shankar · Becca Roelofs · Horia Mania · Benjamin Recht · Ludwig Schmidt -
2021 : Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2021 : Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang -
2022 : Distributed Online and Bandit Convex Optimization »
Kumar Kshitij Patel · Aadirupa Saha · Nati Srebro · Lingxiao Wang -
2022 Poster: A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models »
Lijia Zhou · Frederic Koehler · Pragya Sur · Danica J. Sutherland · Nati Srebro -
2022 Poster: On Margin Maximization in Linear and ReLU Networks »
Gal Vardi · Ohad Shamir · Nati Srebro -
2022 Poster: Towards Optimal Communication Complexity in Distributed Non-Convex Optimization »
Kumar Kshitij Patel · Lingxiao Wang · Blake Woodworth · Brian Bullins · Nati Srebro -
2022 Poster: Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization »
Idan Amir · Roi Livni · Nati Srebro -
2022 Poster: The Sample Complexity of One-Hidden-Layer Neural Networks »
Gal Vardi · Ohad Shamir · Nati Srebro -
2022 Poster: Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets »
Gene Li · Cong Ma · Nati Srebro -
2022 Poster: Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization »
Omar Montasser · Steve Hanneke · Nati Srebro -
2022 Poster: Understanding the Eluder Dimension »
Gene Li · Pritish Kamath · Dylan J Foster · Nati Srebro -
2022 Poster: Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Anmol Kabra · Nati Srebro · Zhaoran Wang · Zhuoran Yang -
2021 : Live panel: Did we solve ImageNet? »
Shibani Santurkar · Alexander Kolesnikov · Becca Roelofs -
2021 : Is ImageNet Solved? Evaluating Machine Accuracy »
Becca Roelofs -
2021 Poster: On the Power of Differentiable Learning versus PAC and SQ Learning »
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro -
2021 Oral: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting »
Frederic Koehler · Lijia Zhou · Danica J. Sutherland · Nathan Srebro -
2021 Poster: Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting »
Frederic Koehler · Lijia Zhou · Danica J. Sutherland · Nathan Srebro -
2021 Poster: Representation Costs of Linear Neural Networks: Analysis and Design »
Zhen Dai · Mina Karzand · Nathan Srebro -
2021 Poster: An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning »
Blake Woodworth · Nathan Srebro -
2021 Poster: A Stochastic Newton Algorithm for Distributed Convex Optimization »
Brian Bullins · Kshitij Patel · Ohad Shamir · Nathan Srebro · Blake Woodworth -
2020 : Contributed Talk 6: Do Offline Metrics Predict Online Performance in Recommender Systems? »
Karl Krauth · Sarah Dean · Wenshuo Guo · Benjamin Recht · Michael Jordan -
2020 Poster: On Uniform Convergence and Low-Norm Interpolation Learning »
Lijia Zhou · Danica J. Sutherland · Nati Srebro -
2020 Poster: Reducing Adversarially Robust Learning to Non-Robust PAC Learning »
Omar Montasser · Steve Hanneke · Nati Srebro -
2020 Spotlight: On Uniform Convergence and Low-Norm Interpolation Learning »
Lijia Zhou · Danica J. Sutherland · Nati Srebro -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Poster: Minibatch vs Local SGD for Heterogeneous Distributed Learning »
Blake Woodworth · Kumar Kshitij Patel · Nati Srebro -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Oral: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Ré · Stephen Wright · Feng Niu -
2020 Poster: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2020 Spotlight: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: Model Similarity Mitigates Test Set Overuse »
Horia Mania · John Miller · Ludwig Schmidt · Moritz Hardt · Benjamin Recht -
2019 Poster: A Meta-Analysis of Overfitting in Machine Learning »
Becca Roelofs · Vaishaal Shankar · Benjamin Recht · Sara Fridovich-Keil · Moritz Hardt · John Miller · Ludwig Schmidt -
2019 Poster: Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator »
Karl Krauth · Stephen Tu · Benjamin Recht -
2019 Poster: Certainty Equivalence is Efficient for Linear Quadratic Control »
Horia Mania · Stephen Tu · Benjamin Recht -
2018 Poster: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Spotlight: Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization »
Blake Woodworth · Jialei Wang · Adam Smith · Brendan McMahan · Nati Srebro -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: Simple random search of static linear policies is competitive for reinforcement learning »
Horia Mania · Aurelia Guy · Benjamin Recht -
2018 Poster: Stochastic Cubic Regularization for Fast Nonconvex Optimization »
Nilesh Tripuraneni · Mitchell Stern · Chi Jin · Jeffrey Regier · Michael Jordan -
2018 Poster: The Everlasting Database: Statistical Validity at a Fair Price »
Blake Woodworth · Vitaly Feldman · Saharon Rosset · Nati Srebro -
2018 Oral: Stochastic Cubic Regularization for Fast Nonconvex Optimization »
Nilesh Tripuraneni · Mitchell Stern · Chi Jin · Jeffrey Regier · Michael Jordan -
2018 Poster: Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator »
Sarah Dean · Horia Mania · Nikolai Matni · Benjamin Recht · Stephen Tu -
2018 Poster: On preserving non-discrimination when combining expert advice »
Avrim Blum · Suriya Gunasekar · Thodoris Lykouris · Nati Srebro -
2018 Poster: Blockwise Parallel Decoding for Deep Autoregressive Models »
Mitchell Stern · Noam Shazeer · Jakob Uszkoreit -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Poster: Stochastic Approximation for Canonical Correlation Analysis »
Raman Arora · Teodor Vanislavov Marinov · Poorya Mianjy · Nati Srebro -
2017 Poster: Exploring Generalization in Deep Learning »
Behnam Neyshabur · Srinadh Bhojanapalli · David Mcallester · Nati Srebro -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Oral: Test of Time Award »
ali rahimi · Benjamin Recht -
2017 Poster: Kernel Feature Selection via Conditional Covariance Minimization »
Jianbo Chen · Mitchell Stern · Martin J Wainwright · Michael Jordan -
2016 Poster: Tight Complexity Bounds for Optimizing Composite Objectives »
Blake Woodworth · Nati Srebro -
2016 Poster: The Power of Adaptivity in Identifying Statistical Alternatives »
Kevin Jamieson · Daniel Haas · Benjamin Recht -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher Ré · Benjamin Recht -
2016 Poster: Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis »
Weiran Wang · Jialei Wang · Dan Garber · Dan Garber · Nati Srebro -
2016 Poster: Global Optimality of Local Search for Low Rank Matrix Recovery »
Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations »
Behnam Neyshabur · Yuhuai Wu · Russ Salakhutdinov · Nati Srebro -
2016 Poster: Equality of Opportunity in Supervised Learning »
Moritz Hardt · Eric Price · Eric Price · Nati Srebro -
2016 Poster: Normalized Spectral Map Synchronization »
Yanyao Shen · Qixing Huang · Nati Srebro · Sujay Sanghavi -
2015 Poster: Parallel Correlation Clustering on Big Graphs »
Xinghao Pan · Dimitris Papailiopoulos · Samet Oymak · Benjamin Recht · Kannan Ramchandran · Michael Jordan -
2015 Poster: Path-SGD: Path-Normalized Optimization in Deep Neural Networks »
Behnam Neyshabur · Russ Salakhutdinov · Nati Srebro -
2014 Poster: Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm »
Deanna Needell · Rachel Ward · Nati Srebro -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: Stochastic Optimization of PCA with Capped MSG »
Raman Arora · Andrew Cotter · Nati Srebro -
2013 Poster: Auditing: Active Learning with Outcome-Dependent Query Costs »
Sivan Sabato · Anand D Sarwate · Nati Srebro -
2013 Poster: Streaming Variational Bayes »
Tamara Broderick · Nicholas Boyd · Andre Wibisono · Ashia C Wilson · Michael Jordan -
2013 Poster: The Power of Asymmetry in Binary Hashing »
Behnam Neyshabur · Nati Srebro · Russ Salakhutdinov · Yury Makarychev · Payman Yadollahpour -
2012 Poster: Sparse Prediction with the $k$-Support Norm »
Andreas Argyriou · Rina Foygel · Nati Srebro -
2012 Spotlight: Sparse Prediction with the $k$-Support Norm »
Andreas Argyriou · Rina Foygel · Nati Srebro -
2012 Poster: Matrix reconstruction with the local max norm »
Rina Foygel · Nati Srebro · Russ Salakhutdinov -
2011 Poster: Beating SGD: Learning SVMs in Sublinear Time »
Elad Hazan · Tomer Koren · Nati Srebro -
2011 Poster: Better Mini-Batch Algorithms via Accelerated Gradient Methods »
Andrew Cotter · Ohad Shamir · Nati Srebro · Karthik Sridharan -
2011 Poster: On the Universality of Online Mirror Descent »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2011 Poster: Learning with the weighted trace-norm under arbitrary sampling distributions »
Rina Foygel · Russ Salakhutdinov · Ohad Shamir · Nati Srebro -
2010 Session: Spotlights Session 11 »
Nati Srebro -
2010 Session: Oral Session 13 »
Nati Srebro -
2010 Poster: Tight Sample Complexity of Large-Margin Learning »
Sivan Sabato · Nati Srebro · Naftali Tishby -
2010 Poster: Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm »
Russ Salakhutdinov · Nati Srebro -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp -
2010 Poster: Smoothness, Low Noise and Fast Rates »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2009 Workshop: Understanding Multiple Kernel Learning Methods »
Brian McFee · Gert Lanckriet · Francis Bach · Nati Srebro -
2009 Poster: Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data »
Boaz Nadler · Nati Srebro · Xueyuan Zhou -
2009 Spotlight: Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data »
Boaz Nadler · Nati Srebro · Xueyuan Zhou -
2008 Poster: Fast Rates for Regularized Objectives »
Karthik Sridharan · Shai Shalev-Shwartz · Nati Srebro