Timezone: »
Poster
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
Ayush Sekhari · Karthik Sridharan · Satyen Kale
Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of $O(1/\sqrt{n})$, where $n$ is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: * We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD.* We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$.* We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.
Author Information
Ayush Sekhari (Cornell University)
Karthik Sridharan (Cornell)
Satyen Kale (Google)
More from the Same Authors
-
2021 Spotlight: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2022 Poster: Reproducibility in Optimization: Theoretical Framework and Limits »
Kwangjun Ahn · Prateek Jain · Ziwei Ji · Satyen Kale · Praneeth Netrapalli · Gil I Shamir -
2022 Poster: From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent »
Christopher De Sa · Satyen Kale · Jason Lee · Ayush Sekhari · Karthik Sridharan -
2021 Poster: Learning with User-Level Privacy »
Daniel Levy · Ziteng Sun · Kareem Amin · Satyen Kale · Alex Kulesza · Mehryar Mohri · Ananda Theertha Suresh -
2021 Poster: Breaking the centralized barrier for cross-device federated learning »
Sai Praneeth Karimireddy · Martin Jaggi · Satyen Kale · Mehryar Mohri · Sashank Reddi · Sebastian Stich · Ananda Theertha Suresh -
2021 Poster: Neural Active Learning with Performance Guarantees »
Zhilei Wang · Pranjal Awasthi · Christoph Dann · Ayush Sekhari · Claudio Gentile -
2021 Poster: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Poster: Remember What You Want to Forget: Algorithms for Machine Unlearning »
Ayush Sekhari · Jayadev Acharya · Gautam Kamath · Ananda Theertha Suresh -
2020 Poster: Estimating Training Data Influence by Tracing Gradient Descent »
Garima Pruthi · Frederick Liu · Satyen Kale · Mukund Sundararajan -
2020 Spotlight: Estimating Training Data Influence by Tracing Gradient Descent »
Garima Pruthi · Frederick Liu · Satyen Kale · Mukund Sundararajan -
2020 Poster: Reinforcement Learning with Feedback Graphs »
Christoph Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan -
2020 Poster: PAC-Bayes Learning Bounds for Sample-Dependent Priors »
Pranjal Awasthi · Satyen Kale · Stefani Karp · Mehryar Mohri -
2019 Poster: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces »
Chuan Guo · Ali Mousavi · Xiang Wu · Daniel Holtmann-Rice · Satyen Kale · Sashank Reddi · Sanjiv Kumar -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2018 Poster: Online Learning of Quantum States »
Scott Aaronson · Xinyi Chen · Elad Hazan · Satyen Kale · Ashwin Nayak -
2018 Poster: Adaptive Methods for Nonconvex Optimization »
Manzil Zaheer · Sashank Reddi · Devendra S Sachan · Satyen Kale · Sanjiv Kumar -
2018 Poster: Uniform Convergence of Gradients for Non-Convex Learning and Optimization »
Dylan Foster · Ayush Sekhari · Karthik Sridharan -
2017 Poster: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2017 Spotlight: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2016 Poster: Hardness of Online Sleeping Combinatorial Optimization Problems »
Satyen Kale · Chansoo Lee · David Pal -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Optimal and Adaptive Algorithms for Online Boosting »
Satyen Kale -
2015 Poster: Online Gradient Boosting »
Alina Beygelzimer · Elad Hazan · Satyen Kale · Haipeng Luo -
2014 Workshop: NIPS Workshop on Transactional Machine Learning and E-Commerce »
David Parkes · David H Wolpert · Jennifer Wortman Vaughan · Jacob D Abernethy · Amos Storkey · Mark Reid · Ping Jin · Nihar Bhadresh Shah · Mehryar Mohri · Luis E Ortiz · Robin Hanson · Aaron Roth · Satyen Kale · Sebastien Lahaie -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: Adaptive Market Making via Online Learning »
Jacob D Abernethy · Satyen Kale -
2013 Oral: Adaptive Market Making via Online Learning »
Jacob D Abernethy · Satyen Kale -
2011 Poster: Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction »
Elad Hazan · Satyen Kale -
2010 Poster: Non-Stochastic Bandit Slate Problems »
Satyen Kale · Lev Reyzin · Robert E Schapire -
2009 Poster: On Stochastic and Worst-case Models for Investing »
Elad Hazan · Satyen Kale -
2009 Oral: On Stochastic and Worst-case Models for Investing »
Elad Hazan · Satyen Kale -
2009 Poster: Beyond Convexity: Online Submodular Minimization »
Elad Hazan · Satyen Kale -
2007 Poster: Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria »
Elad Hazan · Satyen Kale