Skip to yearly menu bar Skip to main content


Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts

Maxime Heuillet · Yufei CUI · Boxing Chen · Audrey Durand · Prasanna Parthasarathi

Abstract

Chat is not available.