Workshop
|
|
Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan · Wenjie Mo · Xiang Ren · Robin Jia
|
|
Workshop
|
|
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You · Daniel Lowd
|
|
Workshop
|
|
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Aidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper
|
|
Workshop
|
|
Universal Jailbreak Backdoors in Large Language Model Alignment
Thomas Baumann
|
|
Workshop
|
Sun 13:30
|
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
|
|
Workshop
|
|
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Alex Unnervik · Hatef Otroshi Shahreza · Anjith George · Sébastien Marcel
|
|
Workshop
|
|
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Alex Unnervik · Hatef Otroshi Shahreza · Anjith George · Sébastien Marcel
|
|