Skip to yearly menu bar Skip to main content


Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

Shuyao Xu · Cheng Peng · Jiangxuan Long · Weidi Xu · Wei Chu · Yuan Qi

Abstract

Chat is not available.