Skip to yearly menu bar Skip to main content

Workshop: Intrinsically Motivated Open-ended Learning (IMOL) Workshop

Reinforcement Learning of Diverse Skills using Mixture of Deep Experts

Onur Celik · Aleksandar Taranovic · Gerhard Neumann

Keywords: [ mixture of experts ] [ Automatic Curriculum Learning ] [ Reinforcement Learning ] [ Diverse Skill Learning ]


Agents that can acquire diverse skills to solve the same task have a benefit over other agents if e.g. unexpected environmental changes occur. However, Reinforcement Learning (RL) policies mainly rely on Gaussian parameterization, preventing them from learning multi-modal, diverse skills. In this work, we propose a novel RL approach for training policies that exhibit diverse behavior. To this end, we propose a highly non-linear Mixture of Experts (MoE) as the policy representation, where each expert formalizes a skill as a contextual motion primitive. The context defines the task, which can be for instance the goal reaching position of the agent, or changing physical parameters like friction. Given a context, our trained policy first selects an expert out of the repertoire of skills and subsequently adapts the parameters of the contextual motion primitive. To incentivize our policy to learn diverse skills, we leverage a maximum entropy objective combined with a per-expert context distribution that we optimize alongside each expert. The per-expert context distribution allows each expert to focus on a context sub-space and boost learning speed. However, these distributions need to be able to represent multi-modality and hard discontinuities in the environment's context probability space. We solve these requirements by leveraging energy-based models to represent the per-expert context distributions and show how we can efficiently train them using the standard policy gradient objective.

Chat is not available.