Timezone: »

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo
Randomized least-square value iteration (RLSVI) is a provably efficient exploration method. However, it is limited to the case where 1) a good feature is known in advance and 2) this feature is fixed during the training: if otherwise, RLSVI suffers an unbearable computational burden to obtain the posterior samples of the parameter in the $Q$-value function. In this work, we present a practical algorithm named HyperDQN, addressing these two issues under the context of deep reinforcement learning, where the feature changes over iterations. HyperDQN is built on two parametric models: in addition to a non-linear neural network (i.e., base model) that predicts $Q$-values, our method employs a probabilistic hypermodel (i.e., meta model), which outputs the parameter of the base model. When both models are jointly optimized under a specifically designed objective, three purposes can be achieved. First, the hypermodel can generate approximate posterior samples regarding the parameter of the $Q$-value function. As a result, diverse $Q$-value functions are sampled to select exploratory action sequences. This retains the punchline of RLSVI for efficient exploration. Second, a good feature is learned to approximate $Q$-value functions. This addresses limitation 1. Third, the posterior samples of the $Q$-value function can be obtained in a more efficient way than the existing methods, and the changing feature does not affect the efficiency. This deals with limitation 2. On the Atari 2600 suite, after $20$M samples, HyperDQN achieves about $2 \times$ improvements over (double) DQN, the advanced method Bootstrapped DQN, and the SOTA exploration bonus method OB2I. For another challenging task SuperMarioBros, HyperDQN outperforms baselines on $7$ out of $9$ games.

Author Information

Ziniu Li (The Chinese University of Hong Kong, Shenzhen)
Yingru Li (The Chinese University of Hong Kong, Shenzhen)
Yushun Zhang (The Chinese University of Hong Kong, Shenzhen)

I am a Ph.D. student under the supervision of Prof. Tom Zhi-Quan Luo and Prof. Tong Zhang, I am interested in understanding deep learning.

Tong Zhang (Tencent AI Lab)
Zhiquan Luo (The Chinese University of Hong Kong, Shenzhen and Shenzhen Research Institute of Big Data)

More from the Same Authors