Timezone: »

Continuous Control With Ensemble Deep Deterministic Policy Gradients
Piotr Januszewski · Mateusz Olko · Michał Królikowski · Jakub Swiatkowski · Marcin Andrychowicz · Łukasz Kuciński · Piotr Miłoś
Event URL: https://openreview.net/forum?id=TIUfoXsnxB »
The growth of deep reinforcement learning (RL) has brought multiple exciting tools and methods to the field. This rapid expansion makes it important to understand the interplay between individual elements of the RL toolbox. We approach this task from an empirical perspective by conducting a study in the continuous control setting. We present multiple insights of fundamental nature, including: a commonly used additive action noise is not required for effective exploration and can even hinder training; the performance of policies trained using existing methods varies significantly across training runs, epochs of training, and evaluation runs; the critics' initialization plays the major role in ensemble-based actor-critic exploration, while the training is mostly invariant to the actors' initialization; a strategy based on posterior sampling explores better than the approximated UCB combined with the weighted Bellman backup; the weighted Bellman backup alone cannot replace the clipped double Q-Learning. As a conclusion, we show how existing tools can be brought together in a novel way, giving rise to the Ensemble Deep Deterministic Policy Gradients (ED2) method, to yield state-of-the-art results on continuous control tasks from $\mbox{OpenAI Gym MuJoCo}$. From the practical side, ED2 is conceptually straightforward, easy to code, and does not require knowledge outside of the existing RL toolbox.

Author Information

Piotr Januszewski (Uniwersytet Warszawski, ul. Krakowskie Przedmieście 26/28, 00-927 Warszawa, NIP 525-001-12-66)
Mateusz Olko (Warsaw University, Uniwersytet Warszawski, ul. Krakowskie Przedmieście 26/28, 00-927 Warszawa, NIP 525-001-12-66.)
Michał Królikowski (University of Warsaw)
Jakub Swiatkowski (University of Warsaw)
Marcin Andrychowicz (Google DeepMind)
Łukasz Kuciński (Polish Academy of Sciences)
Piotr Miłoś (Polish Academy of Sciences, University of Oxford)

More from the Same Authors