Timezone: »
Adaptive gradient methods that rely on scaling gradients down by the square root of exponential moving averages of past squared gradients, such RMSProp, Adam, Adadelta have found wide application in optimizing the nonconvex problems that arise in deep learning. However, it has been recently demonstrated that such methods can fail to converge even in simple convex optimization settings. In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size. Our analysis shows that under this scenario such methods do converge to stationarity up to the statistical limit of variance in the stochastic gradients (scaled by a constant factor). In particular, our result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues. Furthermore, we provide a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence. Extensive experiments show that Yogi with very little hyperparameter tuning outperforms methods such as Adam in several challenging machine learning tasks.
Author Information
Manzil Zaheer (Google)
Sashank Reddi (Google)
Devendra S Sachan (Carnegie Mellon University)
Satyen Kale (Google)
Sanjiv Kumar (Google Research)
More from the Same Authors
-
2022 : Differentially Private Adaptive Optimization with Delayed Preconditioners »
Tian Li · Manzil Zaheer · Ken Liu · Sashank Reddi · H. Brendan McMahan · Virginia Smith -
2022 : Differentially Private Adaptive Optimization with Delayed Preconditioners »
Tian Li · Manzil Zaheer · Ken Liu · Sashank Reddi · H. Brendan McMahan · Virginia Smith -
2022 : Effect of mixup Training on Representation Learning »
Arslan Chaudhry · Aditya Menon · Andreas Veit · Sadeep Jayasumana · Srikumar Ramalingam · Sanjiv Kumar -
2023 Poster: SOAR: Improved Quantization for Nearest Neighbor Search »
Philip Sun · David Simcha · Dave Dopson · Ruiqi Guo · Sanjiv Kumar -
2023 Poster: ResMem: Learn what you can and memorize the rest »
Zitong Yang · MICHAL LUKASIK · Vaishnavh Nagarajan · Zonglin Li · Ankit Rawat · Manzil Zaheer · Aditya Menon · Sanjiv Kumar -
2023 Poster: On student-teacher deviations in distillation: does it pay to disobey? »
Vaishnavh Nagarajan · Aditya Menon · Srinadh Bhojanapalli · Hossein Mobahi · Sanjiv Kumar -
2023 Poster: When Does Confidence-Based Cascade Deferral Suffice? »
Wittawat Jitkrittum · Neha Gupta · Aditya Menon · Harikrishna Narasimhan · Ankit Rawat · Sanjiv Kumar -
2023 Poster: What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models »
Khashayar Gatmiry · Zhiyuan Li · Tengyu Ma · Sashank Reddi · Stefanie Jegelka · Ching-Yao Chuang -
2022 Poster: TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s »
Felix Chern · Blake Hechtman · Andy Davis · Ruiqi Guo · David Majnemer · Sanjiv Kumar -
2022 Poster: Decoupled Context Processing for Context Augmented Language Modeling »
Zonglin Li · Ruiqi Guo · Sanjiv Kumar -
2022 Poster: Post-hoc estimators for learning to defer to an expert »
Harikrishna Narasimhan · Wittawat Jitkrittum · Aditya Menon · Ankit Rawat · Sanjiv Kumar -
2022 Poster: Reproducibility in Optimization: Theoretical Framework and Limits »
Kwangjun Ahn · Prateek Jain · Ziwei Ji · Satyen Kale · Praneeth Netrapalli · Gil I Shamir -
2022 Poster: From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent »
Christopher De Sa · Satyen Kale · Jason Lee · Ayush Sekhari · Karthik Sridharan -
2021 Poster: SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs »
Ayush Sekhari · Karthik Sridharan · Satyen Kale -
2021 Poster: Batch Active Learning at Scale »
Gui Citovsky · Giulia DeSalvo · Claudio Gentile · Lazaros Karydas · Anand Rajagopalan · Afshin Rostamizadeh · Sanjiv Kumar -
2021 Poster: End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering »
Devendra Singh · Siva Reddy · Will Hamilton · Chris Dyer · Dani Yogatama -
2021 Poster: No Regrets for Learning the Prior in Bandits »
Soumya Basu · Branislav Kveton · Manzil Zaheer · Csaba Szepesvari -
2021 Poster: Learning with User-Level Privacy »
Daniel Levy · Ziteng Sun · Kareem Amin · Satyen Kale · Alex Kulesza · Mehryar Mohri · Ananda Theertha Suresh -
2021 Poster: Breaking the centralized barrier for cross-device federated learning »
Sai Praneeth Karimireddy · Martin Jaggi · Satyen Kale · Mehryar Mohri · Sashank Reddi · Sebastian Stich · Ananda Theertha Suresh -
2021 Poster: Efficient Training of Retrieval Models using Negative Cache »
Erik Lindgren · Sashank Reddi · Ruiqi Guo · Sanjiv Kumar -
2020 Poster: Estimating Training Data Influence by Tracing Gradient Descent »
Garima Pruthi · Frederick Liu · Satyen Kale · Mukund Sundararajan -
2020 Spotlight: Estimating Training Data Influence by Tracing Gradient Descent »
Garima Pruthi · Frederick Liu · Satyen Kale · Mukund Sundararajan -
2020 Poster: Why are Adaptive Methods Good for Attention Models? »
Jingzhao Zhang · Sai Praneeth Karimireddy · Andreas Veit · Seungyeon Kim · Sashank Reddi · Sanjiv Kumar · Suvrit Sra -
2020 Poster: Multi-Stage Influence Function »
Hongge Chen · Si Si · Yang Li · Ciprian Chelba · Sanjiv Kumar · Duane Boning · Cho-Jui Hsieh -
2020 Poster: O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers »
Chulhee Yun · Yin-Wen Chang · Srinadh Bhojanapalli · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2020 Poster: Robust large-margin learning in hyperbolic space »
Melanie Weber · Manzil Zaheer · Ankit Singh Rawat · Aditya Menon · Sanjiv Kumar -
2020 Poster: PAC-Bayes Learning Bounds for Sample-Dependent Priors »
Pranjal Awasthi · Satyen Kale · Stefani Karp · Mehryar Mohri -
2020 Poster: Learning discrete distributions: user vs item-level privacy »
Yuhan Liu · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Michael D Riley -
2019 Workshop: Sets and Partitions »
Nicholas Monath · Manzil Zaheer · Andrew McCallum · Ari Kobren · Junier Oliva · Barnabas Poczos · Ruslan Salakhutdinov -
2019 Poster: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces »
Chuan Guo · Ali Mousavi · Xiang Wu · Daniel Holtmann-Rice · Satyen Kale · Sashank Reddi · Sanjiv Kumar -
2019 Poster: Multilabel reductions: what is my loss optimising? »
Aditya Menon · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2019 Spotlight: Multilabel reductions: what is my loss optimising? »
Aditya Menon · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2019 Poster: Sampled Softmax with Random Fourier Features »
Ankit Singh Rawat · Jiecao Chen · Felix Xinnan Yu · Ananda Theertha Suresh · Sanjiv Kumar -
2018 Poster: Nonparametric Density Estimation under Adversarial Losses »
Shashank Singh · Ananya Uppal · Boyue Li · Chun-Liang Li · Manzil Zaheer · Barnabas Poczos -
2018 Poster: Online Learning of Quantum States »
Scott Aaronson · Xinyi Chen · Elad Hazan · Satyen Kale · Ashwin Nayak -
2018 Poster: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2018 Spotlight: cpSGD: Communication-efficient and differentially-private distributed SGD »
Naman Agarwal · Ananda Theertha Suresh · Felix Xinnan Yu · Sanjiv Kumar · Brendan McMahan -
2017 : Now Playing: Continuous low-power music recognition »
Marvin Ritter · Ruiqi Guo · Sanjiv Kumar · Julian J Odell · Mihajlo Velimirović · Dominik Roblek · James Lyon -
2017 : Poster Session 1 and Lunch »
Sumanth Dathathri · Akshay Rangamani · Prakhar Sharma · Aruni RoyChowdhury · Madhu Advani · William Guss · Chulhee Yun · Corentin Hardy · Michele Alberti · Devendra Sachan · Andreas Veit · Takashi Shinozaki · Peter Chin -
2017 Oral: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Deep Sets »
Manzil Zaheer · Satwik Kottur · Siamak Ravanbakhsh · Barnabas Poczos · Ruslan Salakhutdinov · Alexander Smola -
2017 Poster: Multiscale Quantization for Fast Similarity Search »
Xiang Wu · Ruiqi Guo · Ananda Theertha Suresh · Sanjiv Kumar · Daniel Holtmann-Rice · David Simcha · Felix Yu -
2017 Poster: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2017 Spotlight: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2016 Poster: Orthogonal Random Features »
Felix Xinnan Yu · Ananda Theertha Suresh · Krzysztof M Choromanski · Daniel Holtmann-Rice · Sanjiv Kumar -
2016 Oral: Orthogonal Random Features »
Felix Xinnan Yu · Ananda Theertha Suresh · Krzysztof M Choromanski · Daniel Holtmann-Rice · Sanjiv Kumar -
2016 Poster: Hardness of Online Sleeping Combinatorial Optimization Problems »
Satyen Kale · Chansoo Lee · David Pal -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Optimal and Adaptive Algorithms for Online Boosting »
Satyen Kale -
2015 Workshop: The 1st International Workshop "Feature Extraction: Modern Questions and Challenges" »
Dmitry Storcheus · Sanjiv Kumar · Afshin Rostamizadeh -
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Poster: Structured Transforms for Small-Footprint Deep Learning »
Vikas Sindhwani · Tara Sainath · Sanjiv Kumar -
2015 Spotlight: Structured Transforms for Small-Footprint Deep Learning »
Vikas Sindhwani · Tara Sainath · Sanjiv Kumar -
2015 Poster: Online Gradient Boosting »
Alina Beygelzimer · Elad Hazan · Satyen Kale · Haipeng Luo -
2014 Workshop: NIPS Workshop on Transactional Machine Learning and E-Commerce »
David Parkes · David H Wolpert · Jennifer Wortman Vaughan · Jacob D Abernethy · Amos Storkey · Mark Reid · Ping Jin · Nihar Bhadresh Shah · Mehryar Mohri · Luis E Ortiz · Robin Hanson · Aaron Roth · Satyen Kale · Sebastien Lahaie -
2014 Session: Oral Session 8 »
Sanjiv Kumar -
2014 Poster: Discrete Graph Hashing »
Wei Liu · Cun Mu · Sanjiv Kumar · Shih-Fu Chang -
2014 Spotlight: Discrete Graph Hashing »
Wei Liu · Cun Mu · Sanjiv Kumar · Shih-Fu Chang -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: Adaptive Market Making via Online Learning »
Jacob D Abernethy · Satyen Kale -
2013 Oral: Adaptive Market Making via Online Learning »
Jacob D Abernethy · Satyen Kale -
2012 Poster: Angular Quantization based Binary Codes for Fast Similarity Search »
Yunchao Gong · Sanjiv Kumar · Vishal Verma · Svetlana Lazebnik -
2011 Poster: Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction »
Elad Hazan · Satyen Kale -
2010 Poster: Non-Stochastic Bandit Slate Problems »
Satyen Kale · Lev Reyzin · Robert E Schapire -
2009 Poster: Ensemble Nystrom Method »
Sanjiv Kumar · Mehryar Mohri · Ameet S Talwalkar -
2009 Poster: On Stochastic and Worst-case Models for Investing »
Elad Hazan · Satyen Kale -
2009 Oral: On Stochastic and Worst-case Models for Investing »
Elad Hazan · Satyen Kale -
2009 Poster: Beyond Convexity: Online Submodular Minimization »
Elad Hazan · Satyen Kale -
2007 Poster: Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria »
Elad Hazan · Satyen Kale