Skip to yearly menu bar Skip to main content


Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers

Tony Wang ⋅ John Hughes ⋅ Henry Sleight ⋅ Rylan Schaeffer ⋅ Rajashree Agrawal ⋅ Fazl Barez ⋅ Mrinank Sharma ⋅ Jesse Mu ⋅ Nir Shavit ⋅ Ethan Perez

Abstract

Chat is not available.