Skip to yearly menu bar Skip to main content


Best Unpacking DPO and PPO: Disentangling Practices for Learning from Preference Feedback

Hamish Ivison · Yizhong Wang · Jiacheng Liu · Zeqiu Wu · Valentina Pyatkin · Nathan Lambert · Noah Smith · Yejin Choi · Hannaneh Hajishirzi

Abstract

Chat is not available.