Skip to yearly menu bar Skip to main content


Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

Bingning Huang ⋅ Tu Nguyen ⋅ Matthieu Zimmer

Abstract

Chat is not available.