Skip to yearly menu bar Skip to main content


Poster

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

Xiaomeng Hu ⋅ Pin-Yu Chen ⋅ Tsung-Yi Ho
2024 Poster

Abstract

Video

Chat is not available.