Skip to yearly menu bar Skip to main content


Poster

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Leo Schwinn ⋅ David Dobre ⋅ Sophie Xhonneux ⋅ Gauthier Gidel ⋅ Stephan Günnemann
2024 Poster

Abstract

Video

Chat is not available.