Timezone: »

Learning from Trajectories via Subgoal Discovery
Sujoy Paul · Jeroen Vanbaar · Amit Roy-Chowdhury

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #206

Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples. In such cases, using a set of expert trajectories could help to learn faster. However, Imitation Learning (IL) via supervised pre-training with these trajectories may not perform as well and generally requires additional finetuning with expert-in-the-loop. In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals. We learn a function which partitions the state-space into sub-goals, which can then be used to design an extrinsic reward function. We follow a strategy where the agent first learns from the trajectories using IL and then switches to Reinforcement Learning (RL) using the identified sub-goals, to alleviate the errors in the IL step. To deal with states which are under-represented by the trajectory set, we also learn a function to modulate the sub-goal predictions. We show that our method is able to solve complex goal-oriented tasks, which other RL, IL or their combinations in literature are not able to solve.

Author Information

Sujoy Paul (UC Riverside)
Jeroen Vanbaar (MERL (Mitsubishi Electric Research Laboratories), Cambridge MA)
Amit Roy-Chowdhury (University of California, Riverside, USA )