DHP: Discrete Hierarchical Planning for HRL Agents
Abstract
Hierarchical Reinforcement Learning (HRL) agents often fail in long-horizon visual planning because they rely on error-prone distance metrics to choose subgoals. We introduce Discrete Hierarchical Planning (DHP), which evaluates subgoal feasibility using reachability checks instead of continuous distance estimates. DHP builds tree-structured plans that decompose goals into simpler subtasks and employs a -return update that naturally favors shallow decompositions and generalizes beyond training depth. To improve data efficiency, we add an intrinsic exploration policy that automatically generates informative trajectories for training the planner. In a 25-room navigation benchmark, DHP achieves 100% success (vs. 90%) and shorter episode lengths. The method also extends to momentum-based control tasks and requires only O(log N) replanning steps.