Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

31 Results

<<   <   Page 3 of 3   >>   >
Workshop
Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan · Wenjie Mo · Xiang Ren · Robin Jia
Workshop
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You · Daniel Lowd
Workshop
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Aidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper
Workshop
Universal Jailbreak Backdoors in Large Language Model Alignment
Thomas Baumann
Workshop
Sun 13:30 Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Workshop
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Alex Unnervik · Hatef Otroshi Shahreza · Anjith George · Sébastien Marcel
Workshop
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Alex Unnervik · Hatef Otroshi Shahreza · Anjith George · Sébastien Marcel