Timezone: »

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Hiroki Naganuma · Kartik Ahuja · Ioannis Mitliagkas · Shiro Takagi · Tetsuya Motokawa · Rio Yokota · Kohta Ishikawa · Ikuro Sato
Event URL: https://openreview.net/forum?id=i1s663Cqt9 »

Modern deep learning systems are fragile and do not generalize well under distribution shifts. While much promising work has been accomplished to address these concerns, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address the problem settings for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as out-of-distribution datasets for the exhaustive study. We search over a wide range of hyperparameters and examine the classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings: i) contrary to conventional wisdom, adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum-based SGD), ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset – linear returns, increasing returns, and diminishing returns. We believe these findings can help practitioners choose the right optimizer and know what behavior to expect. The code is available at https://anonymous.4open.science/r/OoD-Optimizer-Comparison-37DF.

Author Information

Hiroki Naganuma (University of Montreal)
Kartik Ahuja (Mila)
Ioannis Mitliagkas (University of Montreal)
Shiro Takagi (Independent Researcher)

I am an independent researcher on intelligence. My long-term research goal is to create an artificial researcher. I am interested in symbolic fluency, memory, and autonomy.

Tetsuya Motokawa (University of Tsukuba)
Rio Yokota (Tokyo Institute of Technology, AIST- Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC- OIL), National Institute of Advanced Industrial Science and Technology (AIST))

Rio Yokota received his BS, MS, and PhD from Keio University in 2003, 2005, and 2009, respectively. He is currently an Associate Professor at GSIC, Tokyo Institute of Technology. His research interests range from high performance computing, hierarchical low-rank approximation methods, and scalable deep learning. He was part of the team that won the ACM Gordon Bell prize for price/performance in 2009.

Kohta Ishikawa (Denso IT Laboratory, Inc.)
Ikuro Sato (Tokyo Institute of Technology / Denso IT Laboratory)

More from the Same Authors