Show or Tell? Interactive Task Learning with Large Language Models
Abstract
Large Language Models (LLMs) can perform tasks specified in natural language,making them accessible to users regardless of technical background. However,specifying tasks within a single, static prompt is often both difficult and suboptimal.Interactive Task Learning (ITL)—a goal for autonomous agents—proposesto address this challenge through multi-turn interactions: teachers provide a taskdescription and (optionally) a demonstration, agents attempt the task while askingclarifying questions, and teachers offer feedback. Despite ITL’s promise, systematicevaluation of LLMs’ interactive learning capabilities remains limited. We introducethe ListOps Domain, a novel testbed for evaluating models’ ability to learncompositional symbolic tasks through ITL. We evaluate small-to-medium sizeLLMs (4 to 32 billion parameters) and find that a limited form of teacher feedback—expressing only reminders about broken rules rather than explicitly identifyingor correcting errors—enhances generalization. Using this feedback, we comparemodels’ ITL and Few-Shot Learning (FSL) capabilities and find that ITL frequentlyoutperforms FSL, especially within more powerful models. We conclude with adiscussion of limitations and recommendations for advancing ITL research.