On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training
Jieyu Zhang · Bohan Wang · Zhengyu Hu · Pang Wei Koh · Alexander Ratner

Wed Dec 13 08:45 AM -- 10:45 AM (PST) @ Great Hall & Hall B1+B2 #429

Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks. In this work, we study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. Empirically, we found that with the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity. To understand the underlying mechanism, we show theoretically that the downstream performance depends monotonically on both types of diversity. Notably, our theory reveals that the optimal class-to-sample ratio (#classes / #samples per class) is invariant to the size of the pre-training dataset, which motivates an application of predicting the optimal number of pre-training classes. We demonstrate the effectiveness of this application by an improvement of around 2 points on the downstream tasks when using ImageNet as the pre-training dataset.

Author Information

Jieyu Zhang (Department of Computer Science, University of Washington)
Bohan Wang (USTC)
Zhengyu Hu (HKUST)
Pang Wei Koh (University of Washington)
Alexander Ratner (Stanford University)

