Timezone: »
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.
Author Information
Victor Sanh (Hugging Face 🤗)
Thomas Wolf (🤗 Hugging Face)
Alexander Rush (Cornell University)
More from the Same Authors
-
2021 : End-to-end learning of multiple sequence alignmentswith differentiable Smith-Waterman »
Samantha Petti · Nicholas Bhattacharya · Roshan Rao · Justas Dauparas · Neil Thomas · Juannan Zhou · Alexander Rush · Peter Koo · Sergey Ovchinnikov -
2023 Poster: OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents »
Hugo Laurençon · Lucile Saulnier · Leo Tronchon · Stas Bekman · Amanpreet Singh · Anton Lozhkov · Thomas Wang · Siddharth Karamcheti · Alexander Rush · Douwe Kiela · Matthieu Cord · Victor Sanh -
2021 : Differential Inference: A Criminally Underused Tool. - Alexander Rush - Cornell University »
Alexander Rush -
2021 : End-to-end learning of multiple sequence alignmentswith differentiable Smith-Waterman »
Samantha Petti · Nicholas Bhattacharya · Roshan Rao · Justas Dauparas · Neil Thomas · Juannan Zhou · Alexander Rush · Peter Koo · Sergey Ovchinnikov -
2021 Poster: Distributed Deep Learning In Open Collaborations »
Michael Diskin · Alexey Bukhtiyarov · Max Ryabinin · Lucile Saulnier · quentin lhoest · Anton Sinitsin · Dmitry Popov · Dmitry V. Pyrkin · Maxim Kashirin · Alexander Borzunov · Albert Villanova del Moral · Denis Mazur · Ilia Kobelev · Yacine Jernite · Thomas Wolf · Gennady Pekhimenko -
2021 Poster: Low-Rank Constraints for Fast Inference in Structured Models »
Justin Chiu · Yuntian Deng · Alexander Rush -
2021 : Training Transformers Together »
Alexander Borzunov · Max Ryabinin · Tim Dettmers · quentin lhoest · Lucile Saulnier · Michael Diskin · Yacine Jernite · Thomas Wolf -
2020 Poster: Latent Template Induction with Gumbel-CRFs »
Yao Fu · Chuanqi Tan · Bin Bi · Mosha Chen · Yansong Feng · Alexander Rush -
2020 Poster: Cascaded Text Generation with Markov Transformers »
Yuntian Deng · Alexander Rush -
2020 : An introduction to transfer learning in NLP and HuggingFace »
Thomas Wolf