Skip to yearly menu bar Skip to main content


AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Sicheng Zhu · Ruiyi Zhang · Bang An · Gang Wu · Joe Barrow · Zichao Wang · Furong Huang · Ani Nenkova · Tong Sun

Abstract

Chat is not available.