Skip to yearly menu bar Skip to main content


Reward Model Ensembles Help Mitigate Overoptimization

Thomas Coste ⋅ Usman Anwar ⋅ Robert Kirk ⋅ David Krueger

Abstract

Chat is not available.