Timezone: »
Poster
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
AJAY JAISWAL · Shiwei Liu · Tianlong Chen · Zhangyang "Atlas" Wang
Large pre-trained transformers are $\textit{show-stealer}$ in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale. With exploding parameter counts, Lottery Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in sparsifying them due to high computation and memory bottleneck of repetitive $\textit{train-prune-retrain}$ routine of iterative magnitude pruning (IMP) which worsens with increasing model size. In this paper, we comprehensively study $\textit{induced sparse patterns}$ across multiple large pre-trained vision and language transformers. We propose the existence of -- $\textbf{essential sparsity}$ defined with a $\textbf{sharp dropping point}$ beyond which the performance declines much faster w.r.t the rise of sparsity level, when we directly remove weights with the smallest magnitudes in $\textbf{one-shot}$. We also present an intriguing emerging phenomenon of $\textbf{abrupt sparsification}$ during the pre-training of BERT, i.e., BERT suddenly becomes heavily sparse in pre-training after certain iterations. Moreover, our observations also indicate a $\textbf{counter-intuitive}$ finding that BERT trained with a larger amount of pre-training data tends to have a better ability to condense knowledge in comparatively relatively fewer parameters. Lastly, we investigate the effect of the pre-training loss on essential sparsity and discover that self-supervised learning (SSL) objectives trigger stronger emergent sparsification properties than supervised learning (SL). All our codes will be publicly available.
Author Information
AJAY JAISWAL (The University of Texas, Austin)
Shiwei Liu (UT Austin)
I am a third-year Ph.D. student in the Data Mining Group, Department of Mathematics and Computer Science, Eindhoven University of Technology (TU/e). My current research topics are dynamic sparse training, sparse neural networks, pruning, the generalization of neural networks, etc. I am looking for a postdoc position in machine learning.
Tianlong Chen (MIT/Harvard/UNC Chapel Hill)
Zhangyang "Atlas" Wang (University of Texas at Austin)
More from the Same Authors
-
2022 : HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing »
Tianlong Chen · Chengyue Gong · Daniel Diaz · Xuxi Chen · Jordan Wells · Qiang Liu · Zhangyang "Atlas" Wang · Andrew Ellington · Alex Dimakis · Adam Klivans -
2023 : scCLIP: Multi-modal Single-cell Contrastive Learning Integration Pre-training »
Lei Xiong · Tianlong Chen · Manolis Kellis -
2023 Poster: Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask? »
Hoang Pham · The Anh Ta · Shiwei Liu · Lichuan Xiang · Dung Le · Hongkai Wen · Long Tran-Thanh -
2023 Poster: Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models »
Zhendong Wang · Yifan Jiang · Huangjie Zheng · Peihao Wang · Pengcheng He · Zhangyang "Atlas" Wang · Weizhu Chen · Mingyuan Zhou -
2023 Poster: Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling »
Haotao Wang · Ziyu Jiang · Yuning You · Yan Han · Gaowen Liu · Jayanth Srinivasa · Ramana Kompella · Zhangyang "Atlas" Wang -
2023 Poster: Don’t just prune by magnitude! Your mask topology is a secret weapon »
Duc Hoang · Souvik Kundu · Shiwei Liu · Zhangyang "Atlas" Wang -
2023 Poster: Dynamic Sparsity Is Channel-Level Sparsity Learner »
Lu Yin · Gen Li · Meng Fang · Li Shen · Tianjin Huang · Zhangyang "Atlas" Wang · Vlado Menkovski · Xiaolong Ma · Mykola Pechenizkiy · Shiwei Liu -
2023 Poster: In-Context Learning Unlocked for Diffusion Models »
Zhendong Wang · Yifan Jiang · Yadong Lu · yelong shen · Pengcheng He · Weizhu Chen · Zhangyang "Atlas" Wang · Mingyuan Zhou -
2023 Poster: H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models »
Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Ré · Clark Barrett · Zhangyang "Atlas" Wang · Beidi Chen -
2022 Spotlight: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang "Atlas" Wang -
2022 Poster: Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets »
Ruisi Cai · Zhenyu Zhang · Tianlong Chen · Xiaohan Chen · Zhangyang "Atlas" Wang -
2022 Poster: Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative »
Tianxin Wei · Yuning You · Tianlong Chen · Yang Shen · Jingrui He · Zhangyang "Atlas" Wang -
2022 Poster: Signal Processing for Implicit Neural Representations »
Dejia Xu · Peihao Wang · Yifan Jiang · Zhiwen Fan · Zhangyang "Atlas" Wang -
2022 Poster: Dynamic Sparse Network for Time Series Classification: Learning What to “See” »
Qiao Xiao · Boqian Wu · Yu Zhang · Shiwei Liu · Mykola Pechenizkiy · Elena Mocanu · Decebal Constantin Mocanu -
2022 Poster: Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation »
Ziyu Jiang · Xuxi Chen · Xueqin Huang · Xianzhi Du · Denny Zhou · Zhangyang "Atlas" Wang -
2022 Poster: Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork »
Haotao Wang · Junyuan Hong · Aston Zhang · Jiayu Zhou · Zhangyang "Atlas" Wang -
2022 Poster: Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization »
Junru Wu · Yi Liang · feng han · Hassan Akbari · Zhangyang "Atlas" Wang · Cong Yu -
2022 Poster: Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis »
Wuyang Chen · Wei Huang · Xinyu Gong · Boris Hanin · Zhangyang "Atlas" Wang -
2022 Poster: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang "Atlas" Wang -
2022 Poster: Symbolic Distillation for Learned TCP Congestion Control »
S P Sharan · Wenqing Zheng · Kuo-Feng Hsu · Jiarong Xing · Ang Chen · Zhangyang "Atlas" Wang -
2022 Poster: M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design »
hanxue liang · Zhiwen Fan · Rishov Sarkar · Ziyu Jiang · Tianlong Chen · Kai Zou · Yu Cheng · Cong Hao · Zhangyang "Atlas" Wang -
2022 Poster: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again »
AJAY JAISWAL · Peihao Wang · Tianlong Chen · Justin Rousseau · Ying Ding · Zhangyang "Atlas" Wang -
2022 Poster: Advancing Model Pruning via Bi-level Optimization »
Yihua Zhang · Yuguang Yao · Parikshit Ram · Pu Zhao · Tianlong Chen · Mingyi Hong · Yanzhi Wang · Sijia Liu -
2022 Poster: A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking »
Keyu Duan · Zirui Liu · Peihao Wang · Wenqing Zheng · Kaixiong Zhou · Tianlong Chen · Xia Hu · Zhangyang "Atlas" Wang -
2021 Poster: Improving Contrastive Learning on Imbalanced Data via Open-World Sampling »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2021 Poster: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration »
Shiwei Liu · Tianlong Chen · Xiaohan Chen · Zahra Atashgahi · Lu Yin · Huanyu Kou · Li Shen · Mykola Pechenizkiy · Zhangyang Wang · Decebal Constantin Mocanu -
2021 Poster: Chasing Sparsity in Vision Transformers: An End-to-End Exploration »
Tianlong Chen · Yu Cheng · Zhe Gan · Lu Yuan · Lei Zhang · Zhangyang Wang -
2021 Poster: Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective »
Tianlong Chen · Yu Cheng · Zhe Gan · Jingjing Liu · Zhangyang Wang -
2021 Poster: Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot? »
Xiaolong Ma · Geng Yuan · Xuan Shen · Tianlong Chen · Xuxi Chen · Xiaohan Chen · Ning Liu · Minghai Qin · Sijia Liu · Zhangyang Wang · Yanzhi Wang -
2021 Poster: You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership »
Xuxi Chen · Tianlong Chen · Zhenyu Zhang · Zhangyang Wang -
2020 Workshop: Second Workshop on AI for Humanitarian Assistance and Disaster Response »
Ritwik Gupta · Robin Murphy · Eric Heim · Zhangyang "Atlas" Wang · Bryce Goodman · Nirav Patel · Piotr Bilinski · Edoardo Nemni -
2020 Poster: Graph Contrastive Learning with Augmentations »
Yuning You · Tianlong Chen · Yongduo Sui · Ting Chen · Zhangyang "Atlas" Wang · Yang Shen -
2020 Poster: MATE: Plugging in Model Awareness to Task Embedding for Meta Learning »
Xiaohan Chen · Zhangyang "Atlas" Wang · Siyu Tang · Krikamol Muandet -
2020 Poster: Robust Pre-Training by Adversarial Contrastive Learning »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang "Atlas" Wang -
2020 Poster: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang "Atlas" Wang -
2020 Spotlight: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang "Atlas" Wang -
2020 Poster: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free »
Haotao Wang · Tianlong Chen · Shupeng Gui · TingKuei Hu · Ji Liu · Zhangyang "Atlas" Wang -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang "Atlas" Wang · Yingyan Lin -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang "Atlas" Wang · Michael Carbin -
2020 Poster: ShiftAddNet: A Hardware-Inspired Deep Network »
Haoran You · Xiaohan Chen · Yongan Zhang · Chaojian Li · Sicheng Li · Zihao Liu · Zhangyang "Atlas" Wang · Yingyan Lin