NeurIPS 2019 Schedule

( events) Timezone:

Demonstration

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #807

The Option Keyboard: Combining Skills in Reinforcement Learning

Daniel Toyama · Shaobo Hou · Gheorghe Comanici · Andre Barreto · Doina Precup · Shibl Mourad · Eser Aygün · Philippe Hamel

Our paper introduces a modular RL algorithm that provides a temporally extended interface for RL agents, akin to a piano keyboard: the agent chooses among a large selection of “chords” that correspond to linear combinations of “keys” executed over an extended number of environment steps. The added level of abstraction is obtained by pre-training a set of skills corresponding to a finite set of chords and generalized policy evaluation and improvement to synthesize any other chord on-the-fly. We would like to demonstrate the flexibility of the proposed interface by allowing audience members to perform complex RL tasks through the use of a combination of a small set of skills corresponding to intuitive short term objectives. MIDI musical keyboards will be used to control virtual physical bodies through the original action space (i.e. control body joints), as well as abstract action interfaces. The latter concretely illustrates the ability of linearly combining a finite set of skills, akin to playing chords using a small number of keys.