Timezone: »
In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We conduct experiments on MuJoCo controlling tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
Author Information
Xiong-Hui Chen (Nanjing University)
Yang Yu (Nanjing University)
Qingyang Li (Didi AI Labs)
https://scholar.google.com/citations?user=Pd50HpAAAAAJ&hl=en
Fan-Ming Luo (Nanjing University)
Zhiwei Qin (Didi Research America)
Wenjie Shang (Nanjing University)
Jieping Ye (University of Michigan)
More from the Same Authors
-
2022 Poster: NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning »
Rong-Jun Qin · Xingyuan Zhang · Songyi Gao · Xiong-Hui Chen · Zewen Li · Weinan Zhang · Yang Yu -
2021 Poster: Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning »
Xiong-Hui Chen · Shengyi Jiang · Feng Xu · Zongzhang Zhang · Yang Yu -
2021 Poster: Regret Minimization Experience Replay in Off-Policy Reinforcement Learning »
Xu-Hui Liu · Zhenghai Xue · Jingcheng Pang · Shengyi Jiang · Feng Xu · Yang Yu -
2021 Poster: Adaptive Online Packing-guided Search for POMDPs »
Chenyang Wu · Guoyu Yang · Zongzhang Zhang · Yang Yu · Dong Li · Wulong Liu · Jianye Hao -
2018 Poster: Multi-Layered Gradient Boosting Decision Trees »
Ji Feng · Yang Yu · Zhi-Hua Zhou -
2015 Poster: Subset Selection by Pareto Optimization »
Chao Qian · Yang Yu · Zhi-Hua Zhou