Skip to yearly menu bar Skip to main content


Faster, More Efficient RLHF through Off-Policy Asynchronous Learning

Michael Noukhovitch ⋅ Shengyi Huang ⋅ Sophie Xhonneux ⋅ Arian Hosseini ⋅ Rishabh Agarwal ⋅ Aaron Courville

Abstract

Chat is not available.