Skip to yearly menu bar Skip to main content


Poster

Uncovering Safety Risks of Large Language Models through Concept Activation Vector

Zhihao Xu ⋅ Ruixuan HUANG ⋅ Changyu Chen ⋅ Xiting Wang
2024 Poster

Abstract

Video

Chat is not available.