Timezone: »
Generalizing to out-of-distribution (OOD) data -- that is, data from domains unseen during training -- is a key challenge in modern machine learning, which has only recently received much attention. Some existing approaches propose leveraging larger models and pre-training on larger datasets. In this paper, we provide new insights in applying these approaches. Concretely, we show that larger models and larger datasets need to be \emph{simultaneously} leveraged to improve OOD performance. Moreover, we show that using smaller learning rates during fine-tuning is critical to achieving good results, contrary to popular intuition that larger learning rates generalize better when training from scratch. We show that strategies that improve in-distribution accuracy may, counter-intuitively, lead to poor OOD performance despite strong in-distribution performance. Our insights culminate to a method that achieves state-of-the-art results on a number of OOD generalization benchmark tasks, often by a significant margin.
Author Information
Yaodong Yu (University of California, Berkeley)
Heinrich Jiang (Google Research)
Dara Bahri (Google AI)
Hossein Mobahi (Google Research)
Seungyeon Kim (Google Research)
Ankit Rawat (University of Texas at Austin)
Andreas Veit (Google)
Yi Ma (UC Berkeley)
More from the Same Authors
-
2021 : On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging »
Chris Junchi Li · Yaodong Yu · Nicolas Loizou · Gauthier Gidel · Yi Ma · Nicolas Le Roux perso · Michael Jordan -
2021 : On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging »
Chris Junchi Li · Yaodong Yu · Nicolas Loizou · Gauthier Gidel · Yi Ma · Nicolas Le Roux perso · Michael Jordan -
2021 : Effect of Model Size on Worst-group Generalization »
Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt -
2022 Poster: Transformer Memory as a Differentiable Search Index »
Yi Tay · Vinh Tran · Mostafa Dehghani · Jianmo Ni · Dara Bahri · Harsh Mehta · Zhen Qin · Kai Hui · Zhe Zhao · Jai Gupta · Tal Schuster · William Cohen · Donald Metzler -
2022 : Effect of mixup Training on Representation Learning »
Arslan Chaudhry · Aditya Menon · Andreas Veit · Sadeep Jayasumana · Srikumar Ramalingam · Sanjiv Kumar -
2023 Poster: White-Box Transformers via Sparse Rate Reduction »
Yaodong Yu · Sam Buchanan · Druv Pai · Tianzhe Chu · Ziyang Wu · Shengbang Tong · Benjamin Haeffele · Yi Ma -
2023 Poster: ResMem: Learn what you can and memorize the rest »
Zitong Yang · MICHAL LUKASIK · Vaishnavh Nagarajan · Zonglin Li · Ankit Rawat · Manzil Zaheer · Aditya Menon · Sanjiv Kumar -
2023 Poster: Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning »
Mitsuhiko Nakamoto · Yuexiang Zhai · Anikait Singh · Max Sobol Mark · Yi Ma · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 Poster: When Does Confidence-Based Cascade Deferral Suffice? »
Wittawat Jitkrittum · Neha Gupta · Aditya Menon · Harikrishna Narasimhan · Ankit Rawat · Sanjiv Kumar -
2023 Poster: Sharpness-Aware Minimization Leads to Low-Rank Features »
Maksym Andriushchenko · Dara Bahri · Hossein Mobahi · Nicolas Flammarion -
2022 : Invited Talk: Yi Ma »
Yi Ma -
2022 Poster: A Fourier Approach to Mixture Learning »
Mingda Qiao · Guru Guruganesh · Ankit Rawat · Kumar Avinava Dubey · Manzil Zaheer -
2022 Poster: Robust Calibration with Multi-domain Temperature Scaling »
Yaodong Yu · Stephen Bates · Yi Ma · Michael Jordan -
2022 Poster: Confident Adaptive Language Modeling »
Tal Schuster · Adam Fisch · Jai Gupta · Mostafa Dehghani · Dara Bahri · Vinh Tran · Yi Tay · Donald Metzler -
2022 Poster: Post-hoc estimators for learning to defer to an expert »
Harikrishna Narasimhan · Wittawat Jitkrittum · Aditya Menon · Ankit Rawat · Sanjiv Kumar -
2022 Poster: What You See is What You Get: Principled Deep Learning via Distributional Generalization »
Bogdan Kulynych · Yao-Yuan Yang · Yaodong Yu · Jarosław Błasiok · Preetum Nakkiran -
2022 Poster: TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels »
Yaodong Yu · Alexander Wei · Sai Praneeth Karimireddy · Yi Ma · Michael Jordan -
2022 Poster: Revisiting Sparse Convolutional Model for Visual Recognition »
xili dai · Mingyang Li · Pengyuan Zhai · Shengbang Tong · Xingjian Gao · Shao-Lun Huang · Zhihui Zhu · Chong You · Yi Ma -
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 Poster: Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization »
Chaobing Song · Yong Jiang · Yi Ma -
2020 Poster: Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities »
Chaobing Song · Zhengyuan Zhou · Yichao Zhou · Yong Jiang · Yi Ma -
2020 Poster: Why are Adaptive Methods Good for Attention Models? »
Jingzhao Zhang · Sai Praneeth Karimireddy · Andreas Veit · Seungyeon Kim · Sashank Reddi · Sanjiv Kumar · Suvrit Sra -
2020 Poster: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization »
Chong You · Zhihui Zhu · Qing Qu · Yi Ma -
2020 Spotlight: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization »
Chong You · Zhihui Zhu · Qing Qu · Yi Ma -
2020 Poster: Faster DBSCAN via subsampled similarity queries »
Heinrich Jiang · Jennifer Jang · Jakub Lacki -
2020 Poster: Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction »
Yaodong Yu · Kwan Ho Ryan Chan · Chong You · Chaobing Song · Yi Ma -
2019 Poster: NeurVPS: Neural Vanishing Point Scanning via Conic Convolution »
Yichao Zhou · Haozhi Qi · Jingwei Huang · Yi Ma -
2018 : Adversarial Vision Challenge: Theory-inspired Approaches for Adversarial Machine Learning »
susu xu · Yaodong Yu -
2018 Poster: Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima »
Yaodong Yu · Pan Xu · Quanquan Gu -
2018 Poster: Diminishing Returns Shape Constraints for Interpretability and Regularization »
Maya Gupta · Dara Bahri · Andrew Cotter · Kevin Canini -
2018 Poster: To Trust Or Not To Trust A Classifier »
Heinrich Jiang · Been Kim · Melody Guan · Maya Gupta -
2017 : Poster Session 1 and Lunch »
Sumanth Dathathri · Akshay Rangamani · Prakhar Sharma · Aruni RoyChowdhury · Madhu Advani · William Guss · Chulhee Yun · Corentin Hardy · Michele Alberti · Devendra Sachan · Andreas Veit · Takashi Shinozaki · Peter Chin -
2016 Workshop: Nonconvex Optimization for Machine Learning: Theory and Practice »
Hossein Mobahi · Anima Anandkumar · Percy Liang · Stefanie Jegelka · Anna Choromanska -
2016 Poster: Residual Networks Behave Like Ensembles of Relatively Shallow Networks »
Andreas Veit · Michael J Wilber · Serge Belongie -
2015 : Non convex Optimization by Complexity Progression »
Hossein Mobahi -
2015 Poster: Learning with a Wasserstein Loss »
Charlie Frogner · Chiyuan Zhang · Hossein Mobahi · Mauricio Araya · Tomaso Poggio