NeurIPS 2024

Workshop

Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan · Wenjie Mo · Xiang Ren · Robin Jia

Workshop

The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You · Daniel Lowd

Workshop

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Aidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper

Workshop

Universal Jailbreak Backdoors in Large Language Model Alignment
Thomas Baumann

Workshop

Sun 13:30

Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks

Workshop

Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Alex Unnervik · Hatef Otroshi Shahreza · Anjith George · Sébastien Marcel

Workshop

Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Alex Unnervik · Hatef Otroshi Shahreza · Anjith George · Sébastien Marcel

Main Navigation

31 Results