Skip to yearly menu bar Skip to main content


Emergent Deceptive Behaviors in Reward-Optimizing LLMs

Yujun Zhou ⋅ Han Bao ⋅ Yue Huang ⋅ Kehan Guo ⋅ Zhenwen Liang ⋅ Pin-Yu Chen ⋅ Tian Gao ⋅ Werner Geyer ⋅ Nuno Moniz ⋅ Nitesh Chawla ⋅ Xiangliang Zhang

Abstract

Chat is not available.