Timezone: »
Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning (SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.
Author Information
Zifeng Wang (Northeastern University)
PhD in Machine Learning @Northeastern, Research Intern @GoogleAI. Continual learning, Data & Parameter-efficient learning, Adversarial Robustness.
Zheng Zhan (Northeastern University)
Yifan Gong (Northeastern University)
Geng Yuan (Northeastern University)
Wei Niu (The College of William and Mary)
Tong Jian (Northeastern University)
Bin Ren (Department of Computer Science, College of William and Mary)
Stratis Ioannidis (Northeastern University)
Yanzhi Wang (Northeastern University)
Jennifer Dy (Northeastern University)
More from the Same Authors
-
2020 : Paper 20: YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design »
YUXUAN CAI · Wei Niu · Yanzhi Wang -
2021 Spotlight: MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge »
Geng Yuan · Xiaolong Ma · Wei Niu · Zhengang Li · Zhenglun Kong · Ning Liu · Yifan Gong · Zheng Zhan · Chaoyang He · Qing Jin · Siyue Wang · Minghai Qin · Bin Ren · Yanzhi Wang · Sijia Liu · Xue Lin -
2021 Spotlight: Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space »
Sandesh Ghimire · Aria Masoomi · Jennifer Dy -
2021 : Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection »
Max Torop · Sandesh Ghimire · Dana H Brooks · Octavia Camps · Milind Rajadhyaksha · Kivanc Kose · Jennifer Dy -
2022 Poster: Advancing Model Pruning via Bi-level Optimization »
Yihua Zhang · Yuguang Yao · Parikshit Ram · Pu Zhao · Tianlong Chen · Mingyi Hong · Yanzhi Wang · Sijia Liu -
2022 Poster: Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training »
Geng Yuan · Yanyu Li · Sheng Li · Zhenglun Kong · Sergey Tulyakov · Xulong Tang · Yanzhi Wang · Jian Ren -
2022 Poster: EfficientFormer: Vision Transformers at MobileNet Speed »
Yanyu Li · Geng Yuan · Yang Wen · Ju Hu · Georgios Evangelidis · Sergey Tulyakov · Yanzhi Wang · Jian Ren -
2021 Poster: ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers »
Husheng Han · Kaidi Xu · Xing Hu · Xiaobing Chen · LING LIANG · Zidong Du · Qi Guo · Yanzhi Wang · Yunji Chen -
2021 Poster: Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space »
Sandesh Ghimire · Aria Masoomi · Jennifer Dy -
2021 Poster: Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot? »
Xiaolong Ma · Geng Yuan · Xuan Shen · Tianlong Chen · Xuxi Chen · Xiaohan Chen · Ning Liu · Minghai Qin · Sijia Liu · Zhangyang Wang · Yanzhi Wang -
2021 Poster: MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge »
Geng Yuan · Xiaolong Ma · Wei Niu · Zhengang Li · Zhenglun Kong · Ning Liu · Yifan Gong · Zheng Zhan · Chaoyang He · Qing Jin · Siyue Wang · Minghai Qin · Bin Ren · Yanzhi Wang · Sijia Liu · Xue Lin -
2021 Poster: Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness »
Zifeng Wang · Tong Jian · Aria Masoomi · Stratis Ioannidis · Jennifer Dy -
2020 Workshop: International Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL 2020) »
Xiaolin Andy Li · Dejing Dou · Ameet Talwalkar · Hongyu Li · Jianzong Wang · Yanzhi Wang -
2020 Poster: Neural Topographic Factor Analysis for fMRI Data »
Eli Sennesh · Zulqarnain Khan · Yiyu Wang · J Benjamin Hutchinson · Ajay Satpute · Jennifer Dy · Jan-Willem van de Meent -
2020 Poster: Instance-wise Feature Grouping »
Aria Masoomi · Chieh T Wu · Tingting Zhao · Zifeng Wang · Peter Castaldi · Jennifer Dy -
2019 Poster: Solving Interpretable Kernel Dimensionality Reduction »
Chieh Wu · Jared Miller · Yale Chang · Mario Sznaier · Jennifer Dy