Timezone: »
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy.
Author Information
Hager Radi (University of Alberta)
A first year MSc student in computing science at the university of Alberta.
Josiah Hanna (University of Wisconsin -- Madison)
Peter Stone (The University of Texas at Austin, Sony AI)
Matthew Taylor (U. of Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Dates n/a. Room
More from the Same Authors
-
2020 : Paper 19: Multiagent Driving Policy for Congestion Reduction in a Large Scale Scenario »
Jiaxun Cui · Peter Stone -
2021 : Task-Independent Causal State Abstraction »
Zizhao Wang · Xuesu Xiao · Yuke Zhu · Peter Stone -
2021 : Leveraging Information about Background Music in Human-Robot Interaction »
Elad Liebman · Peter Stone -
2021 : Robust On-Policy Data Collection for Data-Efficient Policy Evaluation »
Rujie Zhong · Josiah Hanna · Lukas Schäfer · Stefano Albrecht -
2022 Workshop: Deep Reinforcement Learning Workshop »
Karol Hausman · Qi Zhang · Matthew Taylor · Martha White · Suraj Nair · Manan Tomar · Risto Vuorio · Ted Xiao · Zeyu Zheng -
2022 Workshop: Reinforcement Learning for Real Life (RL4RealLife) Workshop »
Yuxi Li · Emma Brunskill · MINMIN CHEN · Omer Gottesman · Lihong Li · Yao Liu · Zhiwei Tony Qin · Matthew Taylor -
2021 : Learning Representations for Pixel-based Control: What Matters and Why? »
Manan Tomar · Utkarsh A Mishra · Amy Zhang · Matthew Taylor -
2021 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · David Silver · Matthew Taylor · Martha White · Srijita Das · Yuqing Du · Andrew Patterson · Manan Tomar · Olivia Watkins -
2021 Poster: Adversarial Intrinsic Motivation for Reinforcement Learning »
Ishan Durugkar · Mauricio Tec · Scott Niekum · Peter Stone -
2021 Poster: Conflict-Averse Gradient Descent for Multi-task learning »
Bo Liu · Xingchao Liu · Xiaojie Jin · Peter Stone · Qiang Liu -
2021 Poster: Machine versus Human Attention in Deep Reinforcement Learning Tasks »
Sihang Guo · Ruohan Zhang · Bo Liu · Yifeng Zhu · Dana Ballard · Mary Hayhoe · Peter Stone -
2020 : Q&A: Peter Stone (The University of Texas at Austin): Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination, with Natasha Jaques (Google) [moderator] »
Peter Stone · Natasha Jaques -
2020 : Invited Speaker: Peter Stone (The University of Texas at Austin) on Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination »
Peter Stone -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Discussion Panel »
Pete Florence · Dorsa Sadigh · Carolina Parada · Jeannette Bohg · Roberto Calandra · Peter Stone · Fabio Ramos -
2020 : Invited talk: Peter Stone "Grounded Simulation Learning for Sim2Real with Connections to Off-Policy Reinforcement Learning" »
Peter Stone -
2020 : Contributed Talk: Maximum Reward Formulation In Reinforcement Learning »
Vijaya Sai Krishna Gottipati · Yashaswi Pathak · Rohan Nuttall · Sahir . · Raviteja Chunduru · Ahmed Touati · Sriram Ganapathi · Matthew Taylor · Sarath Chandar -
2020 Poster: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks »
Lemeng Wu · Bo Liu · Peter Stone · Qiang Liu -
2020 Poster: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch »
Siddharth Desai · Ishan Durugkar · Haresh Karnan · Garrett Warnell · Josiah Hanna · Peter Stone -
2018 : Peter Stone »
Peter Stone -
2018 : Control Algorithms for Imitation Learning from Observation »
Peter Stone -
2018 : Peter Stone »
Peter Stone -
2016 : Peter Stone (University of Texas at Austin) »
Peter Stone -
2015 Workshop: Learning, Inference and Control of Multi-Agent Systems »
Vicenç Gómez · Gerhard Neumann · Jonathan S Yedidia · Peter Stone