Timezone: »
Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning’s effect on generalization empirically.We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning's impact on generalization.
Author Information
Tian Jin (Massachusetts Institute of Technology)
Michael Carbin (MIT)
Dan Roy (U of Toronto; Vector)
Jonathan Frankle (School of Engineering and Applied Sciences, Harvard University)
Gintare Karolina Dziugaite (Google Research, Brain Team)
More from the Same Authors
-
2021 Spotlight: Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran · Dan Roy -
2021 : Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via Root-Entropic Regularization »
Blair Bilodeau · Jeffrey Negrea · Dan Roy -
2021 : Stochastic Pruning: Fine-Tuning, and PAC-Bayes bound optimization »
Soufiane Hayou · Bobby He · Gintare Karolina Dziugaite -
2021 : The Dynamics of Functional Diversity throughout Neural Network Training »
Lee Zamparo · Marc-Etienne Brunet · Thomas George · Sepideh Kharaghani · Gintare Karolina Dziugaite -
2022 : Unmasking the Lottery Ticket Hypothesis: Efficient Adaptive Pruning for Finding Winning Tickets »
Mansheej Paul · Feng Chen · Brett Larsen · Jonathan Frankle · Surya Ganguli · Gintare Karolina Dziugaite -
2022 : The Effect of Data Dimensionality on Neural Network Prunability »
Zachary Ankner · Alex Renda · Gintare Karolina Dziugaite · Jonathan Frankle · Tian Jin -
2023 Poster: Shaped Attention Mechanism in the Infinite Depth-and-Width Limit at Initialization »
Lorenzo Noci · Chuning Li · Mufan Li · Bobby He · Thomas Hofmann · Chris Maddison · Dan Roy -
2023 Poster: RapidBERT: Pretraining BERT from Scratch for $20 »
Jacob Portes · Alexander Trott · Sam Havens · DANIEL KING · Abhinav Venigalla · Moin Nadeem · Nikhil Sardana · Daya Khudia · Jonathan Frankle -
2023 Workshop: UniReps: Unifying Representations in Neural Models »
Marco Fumero · Emanuele Rodolà · Francesco Locatello · Gintare Karolina Dziugaite · Mathilde Caron · Clémentine Dominé -
2022 Poster: Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks »
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite -
2022 Poster: The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization »
Mufan Li · Mihai Nica · Dan Roy -
2022 Poster: Adaptively Exploiting d-Separators with Causal Bandits »
Blair Bilodeau · Linbo Wang · Dan Roy -
2021 : Learning Neurosymbolic Performance Models »
Michael Carbin -
2021 Poster: The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization »
Mufan Li · Mihai Nica · Dan Roy -
2021 Poster: Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers »
Jeffrey Negrea · Blair Bilodeau · Nicolò Campolongo · Francesco Orabona · Dan Roy -
2021 Poster: Deep Learning on a Data Diet: Finding Important Examples Early in Training »
Mansheej Paul · Surya Ganguli · Gintare Karolina Dziugaite -
2021 Poster: Towards a Unified Information-Theoretic Framework for Generalization »
Mahdi Haghifam · Gintare Karolina Dziugaite · Shay Moran · Dan Roy -
2020 : Keynote 5: Gintare Karolina Dziugaite »
Gintare Karolina Dziugaite -
2020 Poster: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel »
Stanislav Fort · Gintare Karolina Dziugaite · Mansheej Paul · Sepideh Kharaghani · Daniel Roy · Surya Ganguli -
2020 Poster: Adaptive Gradient Quantization for Data-Parallel SGD »
Fartash Faghri · Iman Tabrizian · Ilia Markov · Dan Alistarh · Daniel Roy · Ali Ramezani-Kebrya -
2020 Poster: Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms »
Mahdi Haghifam · Jeffrey Negrea · Ashish Khisti · Daniel Roy · Gintare Karolina Dziugaite -
2020 Poster: In search of robust measures of generalization »
Gintare Karolina Dziugaite · Alexandre Drouin · Brady Neal · Nitarshan Rajkumar · Ethan Caballero · Linbo Wang · Ioannis Mitliagkas · Daniel Roy -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang Wang · Michael Carbin -
2019 : Lunch break & Poster session »
Breandan Considine · Michael Innes · Du Phan · Dougal Maclaurin · Robin Manhaeve · Alexey Radul · Shashi Gowda · Ekansh Sharma · Eli Sennesh · Maxim Kochurov · Gordon Plotkin · Thomas Wiecki · Navjot Kukreja · Chung-chieh Shan · Matthew Johnson · Dan Belov · Neeraj Pradhan · Wannes Meert · Angelika Kimmig · Luc De Raedt · Brian Patton · Matthew Hoffman · Rif A. Saurous · Daniel Roy · Eli Bingham · Martin Jankowiak · Colin Carroll · Junpeng Lao · Liam Paull · Martin Abadi · Angel Rojas Jimenez · JP Chen -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Workshop: Machine Learning with Guarantees »
Ben London · Gintare Karolina Dziugaite · Daniel Roy · Thorsten Joachims · Aleksander Madry · John Shawe-Taylor -
2019 Poster: Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates »
Jeffrey Negrea · Mahdi Haghifam · Gintare Karolina Dziugaite · Ashish Khisti · Daniel Roy -
2019 Poster: Compiler Auto-Vectorization with Imitation Learning »
Charith Mendis · Cambridge Yang · Yewen Pu · Saman Amarasinghe · Michael Carbin -
2019 Poster: Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes »
Jun Yang · Shengyang Sun · Daniel Roy -
2018 Poster: Data-dependent PAC-Bayes priors via differential privacy »
Gintare Karolina Dziugaite · Daniel Roy -
2017 : Daniel Roy - Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes »
Daniel Roy -
2016 Poster: Measuring the reliability of MCMC inference with bidirectional Monte Carlo »
Roger Grosse · Siddharth Ancha · Daniel Roy -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden -
2014 Poster: Gibbs-type Indian Buffet Processes »
Creighton Heaukulani · Daniel Roy -
2014 Poster: Mondrian Forests: Efficient Online Random Forests »
Balaji Lakshminarayanan · Daniel Roy · Yee Whye Teh -
2013 Session: Session Chair »
Daniel Roy -
2013 Session: Tutorial Session B »
Daniel Roy -
2012 Workshop: Probabilistic Programming: Foundations and Applications (2 day) »
Vikash Mansinghka · Daniel Roy · Noah Goodman -
2012 Workshop: Probabilistic Programming: Foundations and Applications (2 day) »
Vikash Mansinghka · Daniel Roy · Noah Goodman -
2012 Poster: Random function priors for exchangeable graphs and arrays »
James R Lloyd · Daniel Roy · Peter Orbanz · Zoubin Ghahramani -
2011 Poster: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2011 Spotlight: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2008 Workshop: Probabilistic Programming: Universal Languages, Systems and Applications »
Daniel Roy · John Winn · David A McAllester · Vikash Mansinghka · Josh Tenenbaum -
2008 Oral: The Mondrian Process »
Daniel Roy · Yee Whye Teh -
2008 Poster: The Mondrian Process »
Daniel Roy · Yee Whye Teh -
2007 Poster: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2007 Oral: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2006 Poster: Learning annotated hierarchies from relational data »
Daniel Roy · Charles Kemp · Vikash Mansinghka · Josh Tenenbaum -
2006 Talk: Learning annotated hierarchies from relational data »
Daniel Roy · Charles Kemp · Vikash Mansinghka · Josh Tenenbaum