Skip to yearly menu bar Skip to main content


Faster, More Efficient RLHF through Off-Policy Asynchronous Learning

Michael Noukhovitch · Shengyi Huang · Sophie Xhonneux · Arian Hosseini · Rishabh Agarwal · Aaron Courville

Abstract

Chat is not available.