Timezone: »
In reinforcement learning, experience replay stores past samples for further reuse. Prioritized sampling is a promising technique to better utilize these samples. Previous criteria of prioritization include TD error, recentness and corrective feedback, which are mostly heuristically designed. In this work, we start from the regret minimization objective, and obtain an optimal prioritization strategy for Bellman update that can directly maximize the return of the policy. The theory suggests that data with higher hindsight TD error, better on-policiness and more accurate Q value should be assigned with higher weights during sampling. Thus most previous criteria only consider this strategy partially. We not only provide theoretical justifications for previous criteria, but also propose two new methods to compute the prioritization weight, namely ReMERN and ReMERT. ReMERN learns an error network, while ReMERT exploits the temporal ordering of states. Both methods outperform previous prioritized sampling algorithms in challenging RL benchmarks, including MuJoCo, Atari and Meta-World.
Author Information
Xu-Hui Liu (Nanjing University)
Zhenghai Xue (Nanjing University)
Jingcheng Pang (Nanjing University)
Shengyi Jiang (The University of Hong Kong)
Feng Xu (Nanjing University)
Yang Yu (Nanjing University)
More from the Same Authors
-
2021 Poster: Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning »
Xiong-Hui Chen · Shengyi Jiang · Feng Xu · Zongzhang Zhang · Yang Yu -
2021 Poster: Adaptive Online Packing-guided Search for POMDPs »
Chenyang Wu · Guoyu Yang · Zongzhang Zhang · Yang Yu · Dong Li · Wulong Liu · Jianye Hao -
2021 Poster: Offline Model-based Adaptable Policy Learning »
Xiong-Hui Chen · Yang Yu · Qingyang Li · Fan-Ming Luo · Zhiwei Qin · Wenjie Shang · Jieping Ye -
2020 Poster: Offline Imitation Learning with a Misspecified Simulator »
Shengyi Jiang · Jingcheng Pang · Yang Yu -
2018 Poster: Multi-Layered Gradient Boosting Decision Trees »
Ji Feng · Yang Yu · Zhi-Hua Zhou -
2015 Poster: Subset Selection by Pareto Optimization »
Chao Qian · Yang Yu · Zhi-Hua Zhou