Timezone: »

MBAIL: Multi-Batch Best Action Imitation Learning utilizing Sample Transfer and Policy Distillation
Di Wu · · David Meger · Michael Jenkin · Steve Liu · Gregory Dudek

Most online reinforcement learning (RL) algorithms require a large number of interactions with the environment to learn a reliable control policy. Unfortunately, the assumption of the availability of repeated interactions with the environment does not hold for many real-world applications. Batch RL aims to learn a good control policy from a previously collected dataset without requiring additional interactions with the environment, which are very promising in solving real-world problems. However, in the real world, we may only have a limited amount of data points for certain tasks we are interested in. Also, most of the current batch RL methods are mainly aimed to learn policy over one fixed dataset with which it is hard to learn a policy that can perform well over multiple tasks. In this work, we propose to tackle these challenges with sample transfer and policy distillation. The proposed methods are evaluated on multiple control tasks to showcase their effectiveness.

Author Information

Di Wu (Samsung Electronics)
David Meger (McGill University)
Michael Jenkin (York University)
Steve Liu (Samsung Electronics Canada)
Gregory Dudek (Samsung Electronics Canada)

More from the Same Authors