Timezone: »
Transfer learning from the model trained on large datasets to customized downstream tasks has been widely used as the pre-trained model can greatly boost the generalizability. However, the increasing sizes of pre-trained models also lead to a prohibitively large memory footprints for downstream transferring, making them unaffordable for personal devices. Previous work recognizes the bottleneck of the footprint to be the activation, and hence proposes various solutions such as injecting specific lite modules. In this work, we present a novel memory-efficient transfer framework called Back Razor, that can be plug-and-play applied to any pre-trained network without changing its architecture. The key idea of Back Razor is asymmetric sparsifying: pruning the activation stored for back-propagation, while keeping the forward activation dense. It is based on the observation that the stored activation, that dominates the memory footprint, is only needed for backpropagation. Such asymmetric pruning avoids affecting the precision of forward computation, thus making more aggressive pruning possible. Furthermore, we conduct the theoretical analysis for the convergence rate of Back Razor, showing that under mild conditions, our method retains the similar convergence rate as vanilla SGD. Extensive transfer learning experiments on both Convolutional Neural Networks and Vision Transformers with classification, dense prediction, and language modeling tasks show that Back Razor could yield up to 97% sparsity, saving 9.2x memory usage, without losing accuracy. The code is available at: https://github.com/VITA-Group/BackRazor_Neurips22.
Author Information
Ziyu Jiang (Texas A&M University)
Xuxi Chen (University of Texas at Austin)
Xueqin Huang (Texas A&M University - College Station)
Xianzhi Du (Apple)
Denny Zhou (Google)
Zhangyang Wang (University of Texas at Austin)
More from the Same Authors
-
2022 : HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing »
Tianlong Chen · Chengyue Gong · Daniel Diaz · Xuxi Chen · Jordan Wells · Qiang Liu · Zhangyang Wang · Andrew Ellington · Alex Dimakis · Adam Klivans -
2022 Spotlight: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang Wang -
2022 Poster: Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets »
Ruisi Cai · Zhenyu Zhang · Tianlong Chen · Xiaohan Chen · Zhangyang Wang -
2022 Poster: Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative »
Tianxin Wei · Yuning You · Tianlong Chen · Yang Shen · Jingrui He · Zhangyang Wang -
2022 Poster: Signal Processing for Implicit Neural Representations »
Dejia Xu · Peihao Wang · Yifan Jiang · Zhiwen Fan · Zhangyang Wang -
2022 Poster: Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork »
Haotao Wang · Junyuan Hong · Aston Zhang · Jiayu Zhou · Zhangyang Wang -
2022 Poster: Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization »
Junru Wu · Yi Liang · feng han · Hassan Akbari · Zhangyang Wang · Cong Yu -
2022 Poster: Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis »
Wuyang Chen · Wei Huang · Xinyu Gong · Boris Hanin · Zhangyang Wang -
2022 Poster: Sparse Winning Tickets are Data-Efficient Image Recognizers »
Mukund Varma T · Xuxi Chen · Zhenyu Zhang · Tianlong Chen · Subhashini Venugopalan · Zhangyang Wang -
2022 Poster: Symbolic Distillation for Learned TCP Congestion Control »
S P Sharan · Wenqing Zheng · Kuo-Feng Hsu · Jiarong Xing · Ang Chen · Zhangyang Wang -
2022 Poster: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models »
Jason Wei · Xuezhi Wang · Dale Schuurmans · Maarten Bosma · brian ichter · Fei Xia · Ed Chi · Quoc V Le · Denny Zhou -
2022 Poster: M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design »
hanxue liang · Zhiwen Fan · Rishov Sarkar · Ziyu Jiang · Tianlong Chen · Kai Zou · Yu Cheng · Cong Hao · Zhangyang Wang -
2022 Poster: Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again »
AJAY JAISWAL · Peihao Wang · Tianlong Chen · Justin Rousseau · Ying Ding · Zhangyang Wang -
2022 Poster: A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking »
Keyu Duan · Zirui Liu · Peihao Wang · Wenqing Zheng · Kaixiong Zhou · Tianlong Chen · Xia Hu · Zhangyang Wang -
2021 Poster: Improving Contrastive Learning on Imbalanced Data via Open-World Sampling »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2021 Poster: Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot? »
Xiaolong Ma · Geng Yuan · Xuan Shen · Tianlong Chen · Xuxi Chen · Xiaohan Chen · Ning Liu · Minghai Qin · Sijia Liu · Zhangyang Wang · Yanzhi Wang -
2021 Poster: You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership »
Xuxi Chen · Tianlong Chen · Zhenyu Zhang · Zhangyang Wang -
2020 Workshop: Second Workshop on AI for Humanitarian Assistance and Disaster Response »
Ritwik Gupta · Robin Murphy · Eric Heim · Zhangyang Wang · Bryce Goodman · Nirav Patel · Piotr Bilinski · Edoardo Nemni -
2020 Poster: Graph Contrastive Learning with Augmentations »
Yuning You · Tianlong Chen · Yongduo Sui · Ting Chen · Zhangyang Wang · Yang Shen -
2020 Poster: MATE: Plugging in Model Awareness to Task Embedding for Meta Learning »
Xiaohan Chen · Zhangyang Wang · Siyu Tang · Krikamol Muandet -
2020 Poster: Robust Pre-Training by Adversarial Contrastive Learning »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2020 Poster: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Spotlight: Training Stronger Baselines for Learning to Optimize »
Tianlong Chen · Weiyi Zhang · Zhou Jingyang · Shiyu Chang · Sijia Liu · Lisa Amini · Zhangyang Wang -
2020 Poster: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free »
Haotao Wang · Tianlong Chen · Shupeng Gui · TingKuei Hu · Ji Liu · Zhangyang Wang -
2020 Poster: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training »
Yonggan Fu · Haoran You · Yang Zhao · Yue Wang · Chaojian Li · Kailash Gopalakrishnan · Zhangyang Wang · Yingyan Lin -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang Wang · Michael Carbin -
2020 Poster: ShiftAddNet: A Hardware-Inspired Deep Network »
Haoran You · Xiaohan Chen · Yongan Zhang · Chaojian Li · Sicheng Li · Zihao Liu · Zhangyang Wang · Yingyan Lin -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: E2-Train: Training State-of-the-art CNNs with Over 80% Less Energy »
Ziyu Jiang · Yue Wang · Xiaohan Chen · Pengfei Xu · Yang Zhao · Yingyan Lin · Zhangyang Wang