Bridging Symbols from Language and Hierarchical Reinforcement Learning with Active Imitation
Abstract
Large Language Models (LLMs) exhibit their potential for interacting with reinforcement learning (RL) agents, the main challenge is to align the world model learned by the agent with a representation compatible with LLMs. We solve this problem by proposing an algorithm named SGIM-STAR that creates online a discrete world representation by reinforcement learning exploration, it is a hierarchical RL method that augments STAR with a partition-wise, learning-progress–driven switch between a learned Q-learning Navigator and an LLM Navigator. The agent builds a discrete reachability-based partition online and uses intrinsic motivation to query the LLM only when beneficial, defaulting to the learned navigator otherwise. This yields usage cost-aware: the learned navigator dominates early and the LLM is leveraged as the representation matures. On AntMaze, SGIM-STAR achieves the best and most stable success among STAR, LLM-only, and a non-partitioned adaptive variant, avoiding mid-training collapses while reducing LLM calls. The result demonstrates a practical fusion of LLMs with emerging symbolic world models for long-horizon tasks.