Skip to yearly menu bar Skip to main content


San Diego Poster Fri, Dec 5, 2025 • 11:00 AM – 2:00 PM PST Exhibit Hall C,D,E #206

Tapered Off-Policy REINFORCE - Stable and efficient reinforcement learning for large language models

Nicolas Le Roux · Marc Bellemare · Jonathan Lebensold · Arnaud Bergeron · Joshua Greaves · Alexandre Fréchette · Carolyne Pelletier · Eric Thibodeau-Laufer · Sándor Tóth · Sam Work

Abstract

Log in and register to view live content