Skip to yearly menu bar Skip to main content


Poster Wed, Dec 3, 2025 • 4:30 PM – 7:30 PM PST

Pre-Trained Policy Discriminators are General Reward Models

Shihan Dou ⋅ Shichun Liu ⋅ Yuming Yang ⋅ Yicheng Zou ⋅ Yunhua Zhou ⋅ Shuhao Xing ⋅ Chenhao Huang ⋅ Qiming Ge ⋅ haijun Lv ⋅ Demin Song ⋅ Songyang Gao ⋅ Chengqi Lyu ⋅ Enyu Zhou ⋅ Honglin Guo ⋅ Zhiheng Xi ⋅ Qipeng Guo ⋅ Wenwei Zhang ⋅ Tao Gui ⋅ Qi Zhang ⋅ Xipeng Qiu ⋅ Xuanjing Huang ⋅ Kai Chen

Abstract

Video

Chat is not available.