Skip to yearly menu bar Skip to main content


Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers

Tony Wang · John Hughes · Henry Sleight · Rylan Schaeffer · Rajashree Agrawal · Fazl Barez · Mrinank Sharma · Jesse Mu · Nir Shavit · Ethan Perez

Abstract

Chat is not available.