Timezone: »
Poster
Towards Stable Backdoor Purification through Feature Shift Tuning
Rui Min · Zeyu Qin · Li Shen · Minhao Cheng
Event URL: https://github.com/AISafety-HKUST/stable_backdoor_purification »
It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low poisoning rate, the entanglement between backdoor and clean features undermines the effect of tuning-based defenses. Therefore, it is necessary to disentangle the backdoor and clean features in order to improve backdoor purification. To address this, we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor purification. Specifically, FST encourages feature shifts by actively deviating the classifier weights from the originally compromised weights. Extensive experiments demonstrate that our FST provides consistently stable performance under different attack settings. Without complex parameter adjustments, FST also achieves much lower tuning costs, only $10$ epochs. Our codes are available at https://github.com/AISafety-HKUST/stable_backdoor_purification.
It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low poisoning rate, the entanglement between backdoor and clean features undermines the effect of tuning-based defenses. Therefore, it is necessary to disentangle the backdoor and clean features in order to improve backdoor purification. To address this, we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor purification. Specifically, FST encourages feature shifts by actively deviating the classifier weights from the originally compromised weights. Extensive experiments demonstrate that our FST provides consistently stable performance under different attack settings. Without complex parameter adjustments, FST also achieves much lower tuning costs, only $10$ epochs. Our codes are available at https://github.com/AISafety-HKUST/stable_backdoor_purification.
Author Information
Rui Min (Sensetime Group Limited)
Zeyu Qin (HKUST)
Ph.D. student at CSE of HKUST
Li Shen (Tencent AI Lab)
Minhao Cheng (Hong Kong University of Science and Technology)
More from the Same Authors
-
2022 Poster: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach »
Peng Mi · Li Shen · Tianhe Ren · Yiyi Zhou · Xiaoshuai Sun · Rongrong Ji · Dacheng Tao -
2022 : Trusted Aggregation (TAG): Model Filtering Backdoor Defense In Federated Learning »
Joseph Lavond · Minhao Cheng · Yao Li -
2022 : FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning »
Yuanhao Xiong · Ruochen Wang · Minhao Cheng · Felix Yu · Cho-Jui Hsieh -
2022 : Defend Against Textual Backdoor Attacks By Token Substitution »
Xinglin Li · Yao Li · Minhao Cheng -
2022 : Identification of the Adversary from a Single Adversarial Example »
Minhao Cheng · Rui Min -
2023 Poster: Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning »
Guozheng Ma · Linrui Zhang · Haoyu Wang · Lu Li · Zilin Wang · Zhen Wang · Li Shen · Xueqian Wang · Dacheng Tao -
2023 : Are Large Language Models Really Robust to Word-Level Perturbations? »
Haoyu Wang · Guozheng Ma · Cong Yu · Gui Ning · Linrui Zhang · Zhiqi Huang · Suwei Ma · Yongzhe Chang · Sen Zhang · Li Shen · Xueqian Wang · Peilin Zhao · Dacheng Tao -
2023 Poster: Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization »
Yan Sun · Li Shen · Dacheng Tao -
2023 Poster: Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm »
Miaoxi Zhu · Li Shen · Bo Du · Dacheng Tao -
2023 Poster: An Efficient Dataset Condensation Plugin and Its Application to Continual Learning »
Enneng Yang · Li Shen · Zhenyi Wang · Tongliang Liu · Guibing Guo -
2023 Poster: Dynamic Sparsity Is Channel-Level Sparsity Learner »
Lu Yin · Gen Li · Meng Fang · Li Shen · Tianjin Huang · Zhangyang "Atlas" Wang · Vlado Menkovski · Xiaolong Ma · Mykola Pechenizkiy · Shiwei Liu -
2023 Poster: Federated Learning with Manifold Regularization and Normalized Update Reaggregation »
Xuming An · Li Shen · Han Hu · Yong Luo -
2023 Poster: Defending against Data-Free Model Extraction by Distributionally Robust Defensive Training »
Zhenyi Wang · Li Shen · Tongliang Liu · Tiehang Duan · Yanjun Zhu · Donglin Zhan · DAVID DOERMANN · Mingchen Gao -
2023 Poster: Imitation Learning from Imperfection: Theoretical Justifications and Algorithms »
Ziniu Li · Tian Xu · Zeyu Qin · Yang Yu · Zhi-Quan Luo -
2023 Poster: FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning »
Zhuo Huang · Li Shen · Jun Yu · Bo Han · Tongliang Liu -
2022 Poster: Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation »
Zeyu Qin · Yanbo Fan · Yi Liu · Li Shen · Yong Zhang · Jue Wang · Baoyuan Wu -
2022 Poster: MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models »
Erdun Gao · Ignavier Ng · Mingming Gong · Li Shen · Wei Huang · Tongliang Liu · Kun Zhang · Howard Bondell -
2022 Poster: Efficient Non-Parametric Optimizer Search for Diverse Tasks »
Ruochen Wang · Yuanhao Xiong · Minhao Cheng · Cho-Jui Hsieh -
2022 Poster: Random Sharpness-Aware Minimization »
Yong Liu · Siqi Mai · Minhao Cheng · Xiangning Chen · Cho-Jui Hsieh · Yang You -
2021 Poster: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration »
Shiwei Liu · Tianlong Chen · Xiaohan Chen · Zahra Atashgahi · Lu Yin · Huanyu Kou · Li Shen · Mykola Pechenizkiy · Zhangyang Wang · Decebal Constantin Mocanu -
2021 Poster: Random Noise Defense Against Query-Based Black-Box Attacks »
Zeyu Qin · Yanbo Fan · Hongyuan Zha · Baoyuan Wu