Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Robustness in Sequence Modeling

Defend Against Textual Backdoor Attacks By Token Substitution

Xinglin Li · Yao Li · Minhao Cheng


Abstract:

Backdoor attacks are a type of malicious threat to deep neural networks (DNNs). The attacker injects a trigger into the model during the training process. The victim model behaves normally on data without the backdoor attack trigger but gives a prediction the same as the attacker-specified target. Backdoor attacks were first investigated in computer vision. The investigation of backdoor attacks has also emerged in natural language processing (NLP) recently. However, the study of defense methods against textual backdoor attacks is still insufficient. Especially, there are not enough methods available to protect against backdoor attacks using syntax as the trigger. In this paper, we propose a novel method that can effectively defend against syntactic backdoor attacks. Experiments show the effectiveness of our method on BERT for syntactic backdoor attacks when choosing five different syntaxes as triggers.

Chat is not available.