Skip to yearly menu bar Skip to main content


Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

Yiwen Zhu ⋅ Jinyi Liu ⋅ Yifu Yuan ⋅ Wenya Wei ⋅ Zhenxing Ge ⋅ qianyi fu ⋅ Zhou Fang ⋅ Yujing Hu ⋅ Bo An

Abstract

Chat is not available.