Skip to yearly menu bar Skip to main content


Poster

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Hamish Ivison ⋅ Yizhong Wang ⋅ Jiacheng Liu ⋅ Zeqiu Wu ⋅ Valentina Pyatkin ⋅ Nathan Lambert ⋅ Noah Smith ⋅ Yejin Choi ⋅ Hanna Hajishirzi
2024 Poster

Abstract

Video

Chat is not available.