Timezone: »
Modern deep learning systems are fragile and do not generalize well under distribution shifts. While much promising work has been accomplished to address these concerns, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address the problem settings for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as out-of-distribution datasets for the exhaustive study. We search over a wide range of hyperparameters and examine the classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings: i) contrary to conventional wisdom, adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum-based SGD), ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset – linear returns, increasing returns, and diminishing returns. We believe these findings can help practitioners choose the right optimizer and know what behavior to expect. The code is available at https://anonymous.4open.science/r/OoD-Optimizer-Comparison-37DF.
Author Information
Hiroki Naganuma (University of Montreal)
Kartik Ahuja (Mila)
Ioannis Mitliagkas (University of Montreal)
Shiro Takagi (Independent Researcher)
I am an independent researcher on intelligence. My long-term research goal is to create an artificial researcher. I am interested in symbolic fluency, memory, and autonomy.
Tetsuya Motokawa (University of Tsukuba)
Rio Yokota (Tokyo Institute of Technology, AIST- Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC- OIL), National Institute of Advanced Industrial Science and Technology (AIST))
Rio Yokota received his BS, MS, and PhD from Keio University in 2003, 2005, and 2009, respectively. He is currently an Associate Professor at GSIC, Tokyo Institute of Technology. His research interests range from high performance computing, hierarchical low-rank approximation methods, and scalable deep learning. He was part of the team that won the ACM Gordon Bell prize for price/performance in 2009.
Kohta Ishikawa (Denso IT Laboratory, Inc.)
Ikuro Sato (Tokyo Institute of Technology / Denso IT Laboratory)
More from the Same Authors
-
2021 Spotlight: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2022 : Neural Networks Efficiently Learn Low-Dimensional Representations with SGD »
Alireza Mousavi-Hosseini · Sejun Park · Manuela Girotti · Ioannis Mitliagkas · Murat Erdogdu -
2022 : ASDL: A Unified Interface for Gradient Preconditioning in PyTorch »
Kazuki Osawa · Satoki Ishikawa · Rio Yokota · Shigang Li · Torsten Hoefler -
2022 : Thoughts on the Applicability of Machine Learning to Scientific Discovery and Possible Future Research Directions (Perspective) »
Shiro Takagi -
2022 : Performative Prediction with Neural Networks »
Mehrnaz Mofakhami · Ioannis Mitliagkas · Gauthier Gidel -
2022 : A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods »
Tiago Salvador · Kilian FATRAS · Ioannis Mitliagkas · Adam Oberman -
2022 : Object-centric causal representation learning »
Amin Mansouri · Jason Hartford · Kartik Ahuja · Yoshua Bengio -
2022 : Interventional Causal Representation Learning »
Kartik Ahuja · Yixin Wang · Divyat Mahajan · Yoshua Bengio -
2022 : A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · J. Zico Kolter · Nicolas Loizou · Marc Lanctot · Ioannis Mitliagkas · Noam Brown · Christian Kroer -
2022 : Interventional Causal Representation Learning »
Kartik Ahuja · Yixin Wang · Divyat Mahajan · Yoshua Bengio -
2023 Poster: Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning »
Amit Dhurandhar · Karthikeyan Natesan Ramamurthy · Kartik Ahuja · Vijay Arya -
2023 Poster: Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation »
Sébastien Lachapelle · Divyat Mahajan · Ioannis Mitliagkas · Simon Lacoste-Julien -
2023 Poster: CADet: Fully Self-Supervised Out-Of-Distribution Detection With Contrastive Learning »
Charles Guille-Escuret · Pau Rodriguez · David Vazquez · Ioannis Mitliagkas · Joao Monteiro -
2023 Poster: Reusable Slotwise Mechanisms »
Trang Nguyen · Amin Mansouri · Kanika Madan · Khuong Duy Nguyen · Kartik Ahuja · Dianbo Liu · Yoshua Bengio -
2023 Oral: Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation »
Sébastien Lachapelle · Divyat Mahajan · Ioannis Mitliagkas · Simon Lacoste-Julien -
2023 Competition: NeurIPS 2023 Machine Unlearning Competition »
Eleni Triantafillou · Fabian Pedregosa · Meghdad Kurmanji · Kairan ZHAO · Gintare Karolina Dziugaite · Peter Triantafillou · Ioannis Mitliagkas · Vincent Dumoulin · Lisheng Sun · Peter Kairouz · Julio C Jacques Junior · Jun Wan · Sergio Escalera · Isabelle Guyon -
2022 : Interventional Causal Representation Learning »
Kartik Ahuja · Yixin Wang · Divyat Mahajan · Yoshua Bengio -
2022 : Managing the Whole Research Process on GitHub »
Shiro Takagi -
2022 : Separation of Research Data from Its Presentation »
Shiro Takagi -
2022 : Managing the Whole Research Process on GitHub »
Shiro Takagi -
2022 : FL Games: A Federated Learning Framework for Distribution Shifts »
Sharut Gupta · Kartik Ahuja · Mohammad Havaei · Niladri Chatterjee · Yoshua Bengio -
2022 Poster: Weakly Supervised Representation Learning with Sparse Perturbations »
Kartik Ahuja · Jason Hartford · Yoshua Bengio -
2022 Poster: Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound »
Charles Guille-Escuret · Adam Ibrahim · Baptiste Goujaud · Ioannis Mitliagkas -
2022 Poster: On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning »
Shiro Takagi -
2021 Poster: Adversarial Feature Desensitization »
Pouya Bashivan · Reza Bayat · Adam Ibrahim · Kartik Ahuja · Mojtaba Faramarzi · Touraj Laleh · Blake Richards · Irina Rish -
2021 Poster: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2019 Workshop: Bridging Game Theory and Deep Learning »
Ioannis Mitliagkas · Gauthier Gidel · Niao He · Reyhane Askari Hemmat · N H · Nika Haghtalab · Simon Lacoste-Julien -
2019 Poster: Practical Deep Learning with Bayesian Principles »
Kazuki Osawa · Siddharth Swaroop · Mohammad Emtiyaz Khan · Anirudh Jain · Runa Eschenhagen · Richard Turner · Rio Yokota -
2019 Poster: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2017 : Coffee break and Poster Session I »
Nishith Khandwala · Steve Gallant · Gregory Way · Aniruddh Raghu · Li Shen · Aydan Gasimova · Alican Bozkurt · William Boag · Daniel Lopez-Martinez · Ulrich Bodenhofer · Samaneh Nasiri GhoshehBolagh · Michelle Guo · Christoph Kurz · Kirubin Pillay · Kimis Perros · George H Chen · Alexandre Yahi · Madhumita Sushil · Sanjay Purushotham · Elena Tutubalina · Tejpal Virdi · Marc-Andre Schulz · Samuel Weisenthal · Bharat Srikishan · Petar Veličković · Kartik Ahuja · Andrew Miller · Erin Craig · Disi Ji · Filip Dabek · Chloé Pou-Prom · Hejia Zhang · Janani Kalyanam · Wei-Hung Weng · Harish Bhat · Hugh Chen · Simon Kohl · Mingwu Gao · Tingting Zhu · Ming-Zher Poh · Iñigo Urteaga · Antoine Honoré · Alessandro De Palma · Maruan Al-Shedivat · Pranav Rajpurkar · Matthew McDermott · Vincent Chen · Yanan Sui · Yun-Geun Lee · Li-Fang Cheng · Chen Fang · Sibt ul Hussain · Cesare Furlanello · Zeev Waks · Hiba Chougrad · Hedvig Kjellstrom · Finale Doshi-Velez · Wolfgang Fruehwirt · Yanqing Zhang · Lily Hu · Junfang Chen · Sunho Park · Gatis Mikelsons · Jumana Dakka · Stephanie Hyland · yann chevaleyre · Hyunwoo Lee · Xavier Giro-i-Nieto · David Kale · Michael Hughes · Gabriel Erion · Rishab Mehra · William Zame · Stojan Trajanovski · Prithwish Chakraborty · Kelly Peterson · Muktabh Mayank Srivastava · Amy Jin · Heliodoro Tejeda Lemus · Priyadip Ray · Tamas Madl · Joseph Futoma · Enhao Gong · Syed Rameel Ahmad · Eric Lei · Ferdinand Legros -
2017 Poster: DPSCREEN: Dynamic Personalized Screening »
Kartik Ahuja · William Zame · Mihaela van der Schaar