Timezone: »

Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher · Robert Geirhos · Shashank Shekhar · Surya Ganguli · Ari Morcos

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #337

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show both in theory and practice that we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power-law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of 9 different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.

Author Information

Ben Sorscher (Stanford University)
Robert Geirhos (Google Brain)
Shashank Shekhar (Meta AI Research)
Shashank Shekhar

I am an AI Resident at FAIR, Meta AI at Montreal, Canada. Prior to this, I was a research master's student at the University of Guelph and Vector Institute in Ontario, Canada. My areas of interest are self-supervised learning, representation learning, and computer vision.

Surya Ganguli (Stanford)
Ari Morcos (Meta AI, FAIR Team)

More from the Same Authors