Skip to yearly menu bar Skip to main content

Workshop: Distribution shifts: connecting methods and applications (DistShift)

Effect of Model Size on Worst-group Generalization

Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt


A popular belief based on recent work suggest that overparameterization increases worst-group test error on datasets with spurious correlation in the minority subgroup. These work focus on the case where the subgroups are labelled. Thus, to gain a complete picture, we investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings. We evaluate overparameterized ResNet, VGG, and BERT models in the vision and natural language processing domains on datasets with spurious correlations. We improve on the experimental setup of prior works by (1) studying the effect of model size by varying the depth and width of widely-used model architectures, (2) comparing the trends on pretrained models with those trained from scratch. We empirically demonstrate that increasing pretrained model size, by increasing either depth or width, helps or does not hurt worst-group test error under ERM. The Waterbirds and MultiNLI datasets in particular demonstrate a monotonic increase in worst-group accuracy as model size increases. Our systematic study provides benchmarks over a set of datasets and model architectures, and guidance to researchers working on problems without access to subgroup labels.

Chat is not available.