Skip to yearly menu bar Skip to main content


Emergent Deceptive Behaviors in Reward-Optimizing LLMs

Yujun Zhou · Han Bao · Yue Huang · Kehan Guo · Zhenwen Liang · Pin-Yu Chen · Tian Gao · Werner Geyer · Nuno Moniz · Nitesh Chawla · Xiangliang Zhang

Abstract

Chat is not available.