Skip to yearly menu bar Skip to main content


DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

HANYANG ZHAO ⋅ Dawen Liang ⋅ Wenpin Tang ⋅ David Yao ⋅ Nathan Kallus

Abstract

Chat is not available.