Timezone: »

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Tim G. J. Rudner · Cong Lu · Michael A Osborne · Yarin Gal · Yee Teh

Tue Dec 07 04:30 PM -- 06:00 PM (PST) @ None #None

KL-regularized reinforcement learning from expert demonstrations has proved highly successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral policies derived from expert demonstrations suffers from previously unrecognized pathological behavior that can lead to slow, unstable, and suboptimal online training. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that we can resolve this pathology by specifying a non-parametric behavioral policy and that doing so allows KL-regularized RL to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks - without ad-hoc algorithmic design choices.

Author Information

Tim G. J. Rudner (University of Oxford)
Cong Lu (University of Oxford)

PhD student in Autonomous Intelligent Machines and Systems at the University of Oxford. Interested in reinforcement learning, Bayesian deep learning and computer vision.

Michael A Osborne (U Oxford)
Yarin Gal (University of Oxford)
Yee Teh (DeepMind)

More from the Same Authors