Skip to yearly menu bar Skip to main content


DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

HANYANG ZHAO · Dawen Liang · Wenpin Tang · David Yao · Nathan Kallus

Abstract

Chat is not available.