Timezone: »
Offline reinforcement learning (RL) aims at learning effective policies from historical data without extra environment interactions. During our experience of applying offline RL, we noticed that previous offline RL benchmarks commonly involve significant reality gaps, which we have identified include rich and overly exploratory datasets, degraded baseline, and missing policy validation. In many real-world situations, to ensure system safety, running an overly exploratory policy to collect various data is prohibited, thus only a narrow data distribution is available. The resulting policy is regarded as effective if it is better than the working behavior policy; the policy model can be deployed only if it has been well validated, rather than accomplished the training. In this paper, we present a Near real-world offline RL benchmark, named NeoRL, to reflect these properties. NeoRL datasets are collected with a more conservative strategy. Moreover, NeoRL contains the offline training and offline validation pipeline before the online test, corresponding to real-world situations. We then evaluate recent state-of-the-art offline RL algorithms in NeoRL. The empirical results demonstrate that some offline RL algorithms are less competitive to the behavior cloning and the deterministic behavior policy, implying that they could be less effective in real-world tasks than in the previous benchmarks. We also disclose that current offline policy evaluation methods could hardly select the best policy. We hope this work will shed some light on future research and deploying RL in real-world systems.
Author Information
Rong-Jun Qin (Nanjing University)
Xingyuan Zhang (Machine Learning Research Lab)
Songyi Gao
Xiong-Hui Chen (Nanjing University)
Zewen Li (Polixir.AI)
Weinan Zhang (Shanghai Jiao Tong University)
Yang Yu (Nanjing University)
More from the Same Authors
-
2022 Poster: Learning Enhanced Representation for Tabular Data via Neighborhood Propagation »
Kounianhua Du · Weinan Zhang · Ruiwen Zhou · Yangkun Wang · Xilong Zhao · Jiarui Jin · Quan Gan · Zheng Zhang · David P Wipf -
2022 Poster: Efficient Multi-agent Communication via Self-supervised Information Aggregation »
Cong Guan · Feng Chen · Lei Yuan · Chenghe Wang · Hao Yin · Zongzhang Zhang · Yang Yu -
2022 : Multi-Agent Policy Transfer via Task Relationship Modeling »
Rong-Jun Qin · Feng Chen · Tonghan Wang · Lei Yuan · Xiaoran Wu · Yipeng Kang · Zongzhang Zhang · Chongjie Zhang · Yang Yu -
2022 : Visual Imitation Learning with Patch Rewards »
Minghuan Liu · Tairan He · Weinan Zhang · Shuicheng Yan · Zhongwen Xu -
2022 : Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents »
Minghuan Liu · Zhengbang Zhu · Menghui Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao -
2023 Poster: Lending Interaction Wings to Recommender Systems with Conversational Agents »
Jiarui Jin · Xianyu Chen · Fanghua Ye · Mengyue Yang · Yue Feng · Weinan Zhang · Yong Yu · Jun Wang -
2023 Poster: Imitation Learning from Imperfection: Theoretical Justifications and Algorithms »
Ziniu Li · Tian Xu · Zeyu Qin · Yang Yu · Zhi-Quan Luo -
2023 Poster: Adversarial Counterfactual Environment Model Learning »
Xiong-Hui Chen · Yang Yu · Zhengmao Zhu · ZhiHua Yu · Chen Zhenjun · Chenghe Wang · Yinan Wu · Rong-Jun Qin · Hongqiu Wu · Ruijin Ding · Huang Fangsheng -
2023 Poster: Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning »
Haoran He · Chenjia Bai · Kang Xu · Zhuoran Yang · Weinan Zhang · Dong Wang · Bin Zhao · Xuelong Li -
2023 Poster: Natural Language-conditioned Reinforcement Learning with Task-related Language Development and Translation »
Jingcheng Pang · Xin-Yu Yang · Si-Hang Yang · Xiong-Hui Chen · Yang Yu -
2023 Poster: Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models »
Xingyuan Zhang · Philip Becker-Ehmck · Patrick van der Smagt · Maximilian Karl -
2023 Poster: Learning World Models with Identifiable Factorization »
Yuren Liu · Biwei Huang · Zhengmao Zhu · Honglong Tian · Mingming Gong · Yang Yu · Kun Zhang -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Multi-agent Dynamic Algorithm Configuration »
Ke Xue · Jiacheng Xu · Lei Yuan · Miqing Li · Chao Qian · Zongzhang Zhang · Yang Yu -
2022 Spotlight: Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning »
Chenyang Wu · Tianci Li · Zongzhang Zhang · Yang Yu -
2022 Spotlight: Lightning Talks 4B-1 »
Alexandra Senderovich · Zhijie Deng · Navid Ansari · Xuefei Ning · Yasmin Salehi · Xiang Huang · Chenyang Wu · Kelsey Allen · Jiaqi Han · Nikita Balagansky · Tatiana Lopez-Guevara · Tianci Li · Zhanhong Ye · Zixuan Zhou · Feng Zhou · Ekaterina Bulatova · Daniil Gavrilov · Wenbing Huang · Dennis Giannacopoulos · Hans-peter Seidel · Anton Obukhov · Kimberly Stachenfeld · Hongsheng Liu · Jun Zhu · Junbo Zhao · Hengbo Ma · Nima Vahidi Ferdowsi · Zongzhang Zhang · Vahid Babaei · Jiachen Li · Alvaro Sanchez Gonzalez · Yang Yu · Shi Ji · Maxim Rakhuba · Tianchen Zhao · Yiping Deng · Peter Battaglia · Josh Tenenbaum · Zidong Wang · Chuang Gan · Changcheng Tang · Jessica Hamrick · Kang Yang · Tobias Pfaff · Yang Li · Shuang Liang · Min Wang · Huazhong Yang · Haotian CHU · Yu Wang · Fan Yu · Bei Hua · Lei Chen · Bin Dong -
2022 Poster: Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning »
Hua Wei · Jingxiao Chen · Xiyang Ji · Hongyang Qin · Minwen Deng · Siqin Li · Liang Wang · Weinan Zhang · Yong Yu · Liu Linc · Lanxiao Huang · Deheng Ye · Qiang Fu · Wei Yang -
2022 Poster: Reinforcement Learning with Automated Auxiliary Loss Search »
Tairan He · Yuge Zhang · Kan Ren · Minghuan Liu · Che Wang · Weinan Zhang · Yuqing Yang · Dongsheng Li -
2022 Poster: Bootstrapped Transformer for Offline Reinforcement Learning »
Kerong Wang · Hanye Zhao · Xufang Luo · Kan Ren · Weinan Zhang · Dongsheng Li -
2022 Poster: PerfectDou: Dominating DouDizhu with Perfect Information Distillation »
Guan Yang · Minghuan Liu · Weijun Hong · Weinan Zhang · Fei Fang · Guangjun Zeng · Yue Lin -
2022 Poster: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem »
Muning Wen · Jakub Kuba · Runji Lin · Weinan Zhang · Ying Wen · Jun Wang · Yaodong Yang -
2022 Poster: Multi-agent Dynamic Algorithm Configuration »
Ke Xue · Jiacheng Xu · Lei Yuan · Miqing Li · Chao Qian · Zongzhang Zhang · Yang Yu -
2022 Poster: Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning »
Chenyang Wu · Tianci Li · Zongzhang Zhang · Yang Yu -
2021 : More Efficient Adversarial Imitation Learning Algorithms With Known and Unknown Transitions »
Tian Xu · Ziniu Li · Yang Yu -
2021 Poster: Curriculum Offline Imitating Learning »
Minghuan Liu · Hanye Zhao · Zhengyu Yang · Jian Shen · Weinan Zhang · Li Zhao · Tie-Yan Liu -
2021 Poster: Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning »
Xiong-Hui Chen · Shengyi Jiang · Feng Xu · Zongzhang Zhang · Yang Yu -
2021 Poster: On Effective Scheduling of Model-based Reinforcement Learning »
Hang Lai · Jian Shen · Weinan Zhang · Yimin Huang · Xing Zhang · Ruiming Tang · Yong Yu · Zhenguo Li -
2021 Poster: Offline Model-based Adaptable Policy Learning »
Xiong-Hui Chen · Yang Yu · Qingyang Li · Fan-Ming Luo · Zhiwei Qin · Wenjie Shang · Jieping Ye -
2020 Poster: Efficient Projection-free Algorithms for Saddle Point Problems »
Cheng Chen · Luo Luo · Weinan Zhang · Yong Yu -
2020 Poster: Error Bounds of Imitating Policies and Environments »
Tian Xu · Ziniu Li · Yang Yu -
2020 Poster: Offline Imitation Learning with a Misspecified Simulator »
Shengyi Jiang · Jingcheng Pang · Yang Yu -
2020 Poster: Model-based Policy Optimization with Unsupervised Model Adaptation »
Jian Shen · Han Zhao · Weinan Zhang · Yong Yu -
2020 Spotlight: Model-based Policy Optimization with Unsupervised Model Adaptation »
Jian Shen · Han Zhao · Weinan Zhang · Yong Yu -
2019 Poster: Bridging Machine Learning and Logical Reasoning by Abductive Learning »
Wang-Zhou Dai · Qiuling Xu · Yang Yu · Zhi-Hua Zhou -
2017 Demonstration: MAgent: A Many-Agent Reinforcement Learning Research Platform for Artificial Collective Intelligence »
Lianmin Zheng · Jiacheng Yang · Han Cai · Weinan Zhang · Jun Wang · Yong Yu -
2017 Poster: Subset Selection under Noise »
Chao Qian · Jing-Cheng Shi · Yang Yu · Ke Tang · Zhi-Hua Zhou