Timezone: »

Behavior Predictive Representations for Generalization in Reinforcement Learning
Siddhant Agarwal · Aaron Courville · Rishabh Agarwal
Event URL: https://openreview.net/forum?id=b5PJaxS6Jxg »

Deep reinforcement learning (RL) agents trained on a few environments, often struggle to generalize on unseen environments, even when such environments are semantically equivalent to training environments. Such agents learn representations that overfit the characteristics of the training environments. We posit that generalization can be improved by assigning similar representations to scenarios with similar sequences of long-term optimal behavior. To do so, we propose behavior predictive representations (BPR) that capture long-term optimal behavior. BPR trains an agent to predict latent state representations multiple steps into the future such that these representations can predict the optimal behavior at the future steps. We demonstrate that BPR provides large gains on a jumping task from pixels, a problem designed to test generalization.

Author Information

Siddhant Agarwal (Indian Institute of Technology Kharagpur)
Aaron Courville (U. Montreal)
Rishabh Agarwal (Google Research, Brain Team)

More from the Same Authors