Skip to yearly menu bar Skip to main content


Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models

Zizhuo Zhang ⋅ Jianing Zhu ⋅ Xinmu Ge ⋅ Zihua Zhao ⋅ Zhanke Zhou ⋅ Xuan Li ⋅ Xiao Feng ⋅ Jiangchao Yao ⋅ Bo Han

Abstract

Chat is not available.