Skip to yearly menu bar Skip to main content


SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization

Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang

Abstract

Chat is not available.