Skip to yearly menu bar Skip to main content


Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?

Haizhong Zheng · Jiawei Zhao · Beidi Chen

Abstract

Chat is not available.