Spotlight Poster
Post-Robustness of Backdoor Purification
Rui Min · Zeyu Qin · Nevin L. Zhang · Li Shen · Minhao Cheng
East Exhibit Hall A-C #4508
Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To mitigate these security risks, various backdoor purification methods have been proposed to cleanse compromised models. These purified models typically exhibit low Attack Success Rates (ASR), making them resistant to backdoored input. However, Does achieving a low ASR through current purification methods truly eliminate inserted backdoor features? In this paper, we provide an affirmative answer to this question by thoroughly investigating the \textit{Post-Purification Robustness} of current purification methods. We find that current purification methods are susceptible to quickly relearning the backdoor behavior even with fine-tuning on an extremely small number of poisoned samples. Based on this, we propose the Query-based Reactivation Attack (QRA) which could effectively reactivate the backdoor by only querying purified models. We further found the failure to achieve good post-purification robustness is caused by the inadequate deviation of purified models from the backdoored model along the backdoor-connected path. To improve the post-purification robustness, we propose a simple tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates. Extensive experiments demonstrate that PAM significantly improves post-purification robustness while maintaining a good clean accuracy and low attack success rate. Our work provides a new perspective on understanding the effectiveness of backdoor defense and highlights the importance of faithfully assessing post-purification robustness.
Live content is unavailable. Log in and register to view live content