Skip to yearly menu bar Skip to main content


Best Unpacking DPO and PPO: Disentangling Practices for Learning from Preference Feedback

Hamish Ivison ⋅ Yizhong Wang ⋅ Jiacheng Liu ⋅ Zeqiu Wu ⋅ Valentina Pyatkin ⋅ Nathan Lambert ⋅ Noah Smith ⋅ Yejin Choi ⋅ Hannaneh Hajishirzi

Abstract

Chat is not available.