Skip to yearly menu bar Skip to main content


M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization

Bizhe Bai · Hongming Wu · Peng Ye · Tao Chen

Abstract

Chat is not available.