Timezone: »
We present a smoothly broken power law functional form that accurately models the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, or training dataset size varies) for each task within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision and unsupervised language tasks, arithmetic, and reinforcement learning. This functional form yields extrapolations of scaling behavior that often are an order of magnitude more accurate than the ones obtained by other functional forms for neural scaling behavior. Moreover, this functional form accurately models the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior.
Author Information
Ethan Caballero (Mila)
https://www.google.com/#q=ethan+caballero
Kshitij Gupta (Université de Montréal)
Irina Rish (Mila/UdeM)
David Krueger (Mila, University of Montreal)
More from the Same Authors
-
2021 Spotlight: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2021 : FedGMA: Federated Learning with Gradient Masked Averaging »
Irene Tenison · Sai Aravind Sreeramadas · Vaikkunth Mugunthan · Irina Rish -
2022 : A Mechanistic Lens on Mode Connectivity »
Ekdeep S Lubana · Eric Bigelow · Robert Dick · David Krueger · Hidenori Tanaka -
2022 : Broken Neural Scaling Laws »
Ethan Caballero · kshitij Gupta · Irina Rish · David Krueger -
2022 : On The Fragility of Learned Reward Functions »
Lev McKinney · Yawen Duan · David Krueger · Adam Gleave -
2022 : David Krueger: Sources of Specification Failure. »
David Krueger -
2022 Social: ML Safety NeurIPS Social »
David Krueger · Orpheus Lummis · Joshua Clymer -
2022 Poster: Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning »
Aniket Didolkar · Kshitij Gupta · Anirudh Goyal · Nitesh Bharadwaj Gundavarapu · Alex Lamb · Nan Rosemary Ke · Yoshua Bengio -
2021 Poster: Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · Jean-Christophe Gagnon-Audet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish -
2020 Poster: Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning »
Massimo Caccia · Pau Rodriguez · Oleksiy Ostapenko · Fabrice Normandin · Min Lin · Lucas Page-Caccia · Issam Hadj Laradji · Irina Rish · Alexandre Lacoste · David Vázquez · Laurent Charlin -
2020 Poster: In search of robust measures of generalization »
Gintare Karolina Dziugaite · Alexandre Drouin · Brady Neal · Nitarshan Rajkumar · Ethan Caballero · Linbo Wang · Ioannis Mitliagkas · Daniel Roy -
2017 : Break + Poster (1) »
Devendra Singh Chaplot · CHIH-YAO MA · Simon Brodeur · Eri Matsuo · Ichiro Kobayashi · Seitaro Shinagawa · Koichiro Yoshino · Yuhong Guo · Ben Murdoch · Kanthashree Mysore Sathyendra · Daniel Ricks · Haichao Zhang · Joshua Peterson · Li Zhang · Mircea Mironenco · Peter Anderson · Mark Johnson · Kang Min Yoo · Guntis Barzdins · Ahmed H Zaidi · Martin Andrews · Sam Witteveen · SUBBAREDDY OOTA · Prashanth Vijayaraghavan · Ke Wang · Yan Zhu · Renars Liepins · Max Quinn · Amit Raj · Vincent Cartillier · Eric Chu · Ethan Caballero · Fritz Obermeyer