Skip to yearly menu bar Skip to main content


Deep Exploration via Bootstrapped DQN

Ian Osband · Charles Blundell · Alexander Pritzel · Benjamin Van Roy

Area 5+6+7+8 #154

Keywords: [ Ensemble Methods and Boosting ] [ Reinforcement Learning Algorithms ] [ Deep Learning or Neural Networks ]


Efficient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as epsilon-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically efficient RL are not computationally tractable in complex environments. Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN. We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games.

Live content is unavailable. Log in and register to view live content