Poster
in
Workshop: Agent Learning in Open-Endedness Workshop

Skill-Conditioned Policy Optimization with Successor Features Representations

Luca Grillotti ⋅ Maxence Faldor ⋅ Borja G. León ⋅ Antoine Cully

Keywords: Reinforcement Learning quality-diversity

Project Page [ Poster] [ OpenReview]

Abstract

A key aspect of intelligence is the ability to exhibit a wide range of behaviors to adapt to unforeseen situations. Designing artificial agents that are capable of showcasing a broad spectrum of skills is a long-standing challenge in Artificial Intelligence. In the last decade, progress in deep Reinforcement Learning (RL) has enabled to solve complex tasks with high-dimensional, continuous state and action spaces. However, most approaches return only one highly-specialized solution to a single problem. We introduce a Skill-Conditioned OPtimal Agent (SCOPA) that leverages successor features representations to learn skills that solve a task. We derive a policy skill improvement update with successor features analogous to the classic policy improvement update, that we use to learn skills. From this result, we develop an algorithm that combines successor features with universal function approximators to learn a skill representation that extends the traditional concept of goal to trajectory-based skill. We seamlessly unify value function and successor features policy iteration with constrained optimization to (1) maximize performance while (2) executing a skill. Compared with other skill-conditioned RL methods, SCOPA reaches significantly higher performance and skill space coverage on challenging continuous control locomotion tasks with various types of skills. We also demonstrate that the diversity of skills is useful in downstream adaptation tasks. Videos of our results are available at: http://bit.ly/scopa.

Video

Chat is not available.