Workshop: Distribution shifts: connecting methods and applications (DistShift)

An Empirical Study of Pre-trained Models on Out-of-distribution Generalization

Yaodong Yu · Heinrich Jiang · Dara Bahri · Hossein Mobahi · Seungyeon Kim · Ankit Rawat · Andreas Veit · Yi Ma


Generalizing to out-of-distribution (OOD) data -- that is, data from domains unseen during training -- is a key challenge in modern machine learning, which has only recently received much attention. Some existing approaches propose leveraging larger models and pre-training on larger datasets. In this paper, we provide new insights in applying these approaches. Concretely, we show that larger models and larger datasets need to be \emph{simultaneously} leveraged to improve OOD performance. Moreover, we show that using smaller learning rates during fine-tuning is critical to achieving good results, contrary to popular intuition that larger learning rates generalize better when training from scratch. We show that strategies that improve in-distribution accuracy may, counter-intuitively, lead to poor OOD performance despite strong in-distribution performance. Our insights culminate to a method that achieves state-of-the-art results on a number of OOD generalization benchmark tasks, often by a significant margin.

Chat is not available.