Timezone: »
Poster
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen · Yu Cheng · Zhe Gan · Lu Yuan · Lei Zhang · Zhangyang Wang
Vision transformers (ViTs) have recently received explosive popularity, but their enormous model sizes and training costs remain daunting. Conventional post-training pruning often incurs higher training budgets. In contrast, this paper aims to trim down both the training memory overhead and the inference complexity, without sacrificing the achievable accuracy. We carry out the first-of-its-kind comprehensive exploration, on taking a unified approach of integrating sparsity in ViTs "from end to end''. Specifically, instead of training full ViTs, we dynamically extract and train sparse subnetworks, while sticking to a fixed small parameter budget. Our approach jointly optimizes model parameters and explores connectivity throughout training, ending up with one sparse network as the final output. The approach is seamlessly extended from unstructured to structured sparsity, the latter by considering to guide the prune-and-grow of self-attention heads inside ViTs. We further co-explore data and architecture sparsity for additional efficiency gains by plugging in a novel learnable token selector to adaptively determine the currently most vital patches. Extensive results on ImageNet with diverse ViT backbones validate the effectiveness of our proposals which obtain significantly reduced computational cost and almost unimpaired generalization. Perhaps most surprisingly, we find that the proposed sparse (co-)training can sometimes \textit{improve the ViT accuracy} rather than compromising it, making sparsity a tantalizing "free lunch''. For example, our sparsified DeiT-Small at ($5\%$, $50\%$) sparsity for (data, architecture), improves $\mathbf{0.28\%}$ top-1 accuracy, and meanwhile enjoys $\mathbf{49.32\%}$ FLOPs and $\mathbf{4.40\%}$ running time savings. Our codes are available at https://github.com/VITA-Group/SViTE.
Author Information
Tianlong Chen (Unversity of Texas at Austin)
Yu Cheng (Microsoft Research)
Zhe Gan (Duke University)
Lu Yuan (Microsoft)
Lei Zhang (Microsoft)
Zhangyang Wang (UT Austin)
More from the Same Authors
-
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2022 Poster: REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering »
Yuanze Lin · Yujia Xie · Dongdong Chen · Yichong Xu · Chenguang Zhu · Lu Yuan -
2022 Poster: OmniVL: One Foundation Model for Image-Language and Video-Language Tasks »
Junke Wang · Dongdong Chen · Zuxuan Wu · Chong Luo · Luowei Zhou · Yucheng Zhao · Yujia Xie · Ce Liu · Yu-Gang Jiang · Lu Yuan -
2022 : HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing »
Tianlong Chen · Chengyue Gong · Daniel Diaz · Xuxi Chen · Jordan Wells · Qiang Liu · Zhangyang Wang · Andrew Ellington · Alex Dimakis · Adam Klivans -
2022 Spotlight: OmniVL: One Foundation Model for Image-Language and Video-Language Tasks »
Junke Wang · Dongdong Chen · Zuxuan Wu · Chong Luo · Luowei Zhou · Yucheng Zhao · Yujia Xie · Ce Liu · Yu-Gang Jiang · Lu Yuan -
2022 Spotlight: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang Wang -
2022 Poster: Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets »
Ruisi Cai · Zhenyu Zhang · Tianlong Chen · Xiaohan Chen · Zhangyang Wang -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative »
Tianxin Wei · Yuning You · Tianlong Chen · Yang Shen · Jingrui He · Zhangyang Wang -
2022 Poster: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang Wang -
2022 Poster: Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning »
Yujia Xie · Luowei Zhou · Xiyang Dai · Lu Yuan · Nguyen Bach · Ce Liu · Michael Zeng -
2022 Poster: M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design »
hanxue liang · Zhiwen Fan · Rishov Sarkar · Ziyu Jiang · Tianlong Chen · Kai Zou · Yu Cheng · Cong Hao · Zhangyang Wang -
2022 Poster: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again »
AJAY JAISWAL · Peihao Wang · Tianlong Chen · Justin Rousseau · Ying Ding · Zhangyang Wang -
2022 Poster: GLIPv2: Unifying Localization and Vision-Language Understanding »
Haotian Zhang · Pengchuan Zhang · Xiaowei Hu · Yen-Chun Chen · Liunian Li · Xiyang Dai · Lijuan Wang · Lu Yuan · Jenq-Neng Hwang · Jianfeng Gao -
2022 Poster: Advancing Model Pruning via Bi-level Optimization »
Yihua Zhang · Yuguang Yao · Parikshit Ram · Pu Zhao · Tianlong Chen · Mingyi Hong · Yanzhi Wang · Sijia Liu -
2022 Poster: A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking »
Keyu Duan · Zirui Liu · Peihao Wang · Wenqing Zheng · Kaixiong Zhou · Tianlong Chen · Xia Hu · Zhangyang Wang -
2021 Poster: Improving Contrastive Learning on Imbalanced Data via Open-World Sampling »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2021 Poster: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration »
Shiwei Liu · Tianlong Chen · Xiaohan Chen · Zahra Atashgahi · Lu Yin · Huanyu Kou · Li Shen · Mykola Pechenizkiy · Zhangyang Wang · Decebal Constantin Mocanu -
2021 : Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models »
Boxin Wang · Chejian Xu · Shuohang Wang · Zhe Gan · Yu Cheng · Jianfeng Gao · Ahmed Awadallah · Bo Li -
2021 Poster: Stronger NAS with Weaker Predictors »
Junru Wu · Xiyang Dai · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Ye Yu · Zhangyang Wang · Zicheng Liu · Mei Chen · Lu Yuan -
2021 Poster: IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers »
Bowen Pan · Rameswar Panda · Yifan Jiang · Zhangyang Wang · Rogerio Feris · Aude Oliva -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Poster: Hyperparameter Tuning is All You Need for LISTA »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2021 Poster: Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective »
Tianlong Chen · Yu Cheng · Zhe Gan · Jingjing Liu · Zhangyang Wang -
2021 Poster: TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up »
Yifan Jiang · Shiyu Chang · Zhangyang Wang -
2021 Poster: AugMax: Adversarial Composition of Random Augmentations for Robust Training »
Haotao Wang · Chaowei Xiao · Jean Kossaifi · Zhiding Yu · Anima Anandkumar · Zhangyang Wang -
2021 Poster: Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems »
Wenqing Zheng · Qiangqiang Guo · Hao Yang · Peihao Wang · Zhangyang Wang -
2021 Poster: The Elastic Lottery Ticket Hypothesis »
Xiaohan Chen · Yu Cheng · Shuohang Wang · Zhe Gan · Jingjing Liu · Zhangyang Wang -
2021 Poster: Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot? »
Xiaolong Ma · Geng Yuan · Xuan Shen · Tianlong Chen · Xuxi Chen · Xiaohan Chen · Ning Liu · Minghai Qin · Sijia Liu · Zhangyang Wang · Yanzhi Wang -
2021 Poster: You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership »
Xuxi Chen · Tianlong Chen · Zhenyu Zhang · Zhangyang Wang -
2020 Poster: Graph Contrastive Learning with Augmentations »
Yuning You · Tianlong Chen · Yongduo Sui · Ting Chen · Zhangyang Wang · Yang Shen -
2020 Poster: Robust Pre-Training by Adversarial Contrastive Learning »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2020 Poster: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Spotlight: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Poster: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free »
Haotao Wang · Tianlong Chen · Shupeng Gui · TingKuei Hu · Ji Liu · Zhangyang Wang -
2020 Poster: GreedyFool: Distortion-Aware Sparse Adversarial Attack »
Xiaoyi Dong · Dongdong Chen · Jianmin Bao · Chuan Qin · Lu Yuan · Weiming Zhang · Nenghai Yu · Dong Chen -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang Wang · Michael Carbin -
2020 Demonstration: MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval »
Mark Hamilton · Stephanie Fu · Mindren Lu · Johnny Bui · Margaret Wang · Felix Tran · Marina Rogers · Darius Bopp · Christopher Hoder · Lei Zhang · Bill Freeman -
2019 Workshop: AI for Humanitarian Assistance and Disaster Response »
Ritwik Gupta · Robin Murphy · Trevor Darrell · Eric Heim · Zhangyang Wang · Bryce Goodman · Piotr Biliński -
2019 Poster: E2-Train: Training State-of-the-art CNNs with Over 80% Less Energy »
Ziyu Jiang · Yue Wang · Xiaohan Chen · Pengfei Xu · Yang Zhao · Yingyan Lin · Zhangyang Wang -
2019 Poster: Learning to Optimize in Swarms »
Yue Cao · Tianlong Chen · Zhangyang Wang · Yang Shen -
2019 Poster: Model Compression with Adversarial Robustness: A Unified Optimization Framework »
Shupeng Gui · Haotao Wang · Haichuan Yang · Chen Yu · Zhangyang Wang · Ji Liu -
2018 Poster: Can We Gain More from Orthogonality Regularizations in Training Deep Networks? »
Nitin Bansal · Xiaohan Chen · Zhangyang Wang -
2018 Poster: Dialog-based Interactive Image Retrieval »
Xiaoxiao Guo · Hui Wu · Yu Cheng · Steven Rennie · Gerald Tesauro · Rogerio Feris -
2018 Poster: Turbo Learning for CaptionBot and DrawingBot »
Qiuyuan Huang · Pengchuan Zhang · Dapeng Wu · Lei Zhang -
2018 Poster: Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2018 Spotlight: Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2017 Poster: Triangle Generative Adversarial Networks »
Zhe Gan · Liqun Chen · Weiyao Wang · Yuchen Pu · Yizhe Zhang · Hao Liu · Chunyuan Li · Lawrence Carin -
2017 Poster: VAE Learning via Stein Variational Gradient Descent »
Yuchen Pu · Zhe Gan · Ricardo Henao · Chunyuan Li · Shaobo Han · Lawrence Carin -
2017 Poster: Deconvolutional Paragraph Representation Learning »
Yizhe Zhang · Dinghan Shen · Guoyin Wang · Zhe Gan · Ricardo Henao · Lawrence Carin -
2017 Poster: Adversarial Symmetric Variational Autoencoder »
Yuchen Pu · Weiyao Wang · Ricardo Henao · Liqun Chen · Zhe Gan · Chunyuan Li · Lawrence Carin