Skip to yearly menu bar Skip to main content


Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Peter Chen ⋅ Xiaopeng Li ⋅ Ziniu Li ⋅ Xi Chen ⋅ Tianyi Lin

Abstract

Chat is not available.