Poster
in
Workshop: Workshop on Multi-Turn Interactions in Large Language Models Sat, Dec 6, 2025 • 10:30 AM – 11:30 AM PST

Efficient Reinforcement Learning for Optimizing Multi-turn Student Outcomes with LLM Tutors

Hyunji (Alex) Nam · Omer Gottesman · Amy Zhang · Dean Foster · Emma Brunskill · Lyle Ungar

Project Page [ OpenReview]

Abstract

Large language models (LLMs) built on existing reinforcement learning with human feedback (RLHF) frameworks typically optimize immediate responses at each turn. However, this can fail in multi-turn dialogue settings, like online math tutoring, where a single-turn optimal tutor may give away answers instead of guiding the student step by step. We introduce a method that enhances LLM-basedtutors by representing the dialogue history with a lower-dimensional (student) state representation and optimizing a long-term policy to select high-level actions given that state. This better aligns the tutor with the long-term objective of helping the student solve the target math problem(s) independently. Our approach based on lower-dimensional states and high-level actions is more computationally efficientthan training the tutor policy end-to-end to directly generate the tutor’s response. In LLM-simulated tutoring scenarios evaluated on GSM8K, our approach improves student’s long-term outcomes by 50% compared to prompting baselines.

Chat is not available.