Skip to yearly menu bar Skip to main content


On the Rollout-Training Mismatch in Modern RL Systems

Feng Yao ⋅ Liyuan Liu ⋅ Dinghuai Zhang ⋅ Chengyu Dong ⋅ Jingbo Shang ⋅ Jianfeng Gao

Abstract

Chat is not available.