Skip to yearly menu bar Skip to main content


Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen ⋅ Xiaopeng Li ⋅ Ziniu Li ⋅ Wotao Yin ⋅ Xi Chen ⋅ Tianyi Lin

Abstract

Chat is not available.