Invited Talk 1 - Natasha Jaques
Abstract
Title: Multi-turn Reinforcement Learning for LLMs: Optimizing User Curiosity and Adversarially Ensuring Realism
Abstract: In spite of the fact that Reinforcement Learning (RL) training has contributed to massive gains in Large Language Model (LLM) abilities, it is largely still limited to optimizing a single response to a user query, rather than learning how to plan the course of a conversation or interaction. This talk discusses two recent works that extend multi-turn RL to address critical challenges in long-horizon interaction: the first introduces a curiosity-based intrinsic reward that enables LLMs to learn how to learn about the user, significantly improving both personalization and online generalization to new users. The second work introduces Generative Adversarial Post Training (GAPT), an adversarial RL framework which draws from GANs, and is designed to mitigate reward hacking and output collapse in creative, adaptive tasks where preserving diversity and realism is paramount. Together, these methods demonstrate novel approaches to instilling complex, user-aware planning capabilities and safeguarding output quality over extended, multi-turn interactions.