Skip to yearly menu bar Skip to main content


Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Aidan Ewart · Abhay Sheshadri · Phillip Guo · Aengus Lynch · Cindy Wu · Vivek Hebbar · Henry Sleight · Asa Cooper Stickland · Ethan Perez · Dylan Hadfield-Menell · Stephen Casper

Abstract

Chat is not available.