Timezone: »
The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned agent will perform if deployed in the target environment before actually deploying it and without the risk of overestimating its true performance. To achieve this, we introduce a framework for safe evaluation of offline learning using approximate high-confidence off-policy evaluation (HCOPE) to estimate the performance of offline policies during learning. In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping. A lower-bound estimate tells us how good a newly-learned target policy would perform before it is deployed in the real environment, and therefore allows us to decide when to deploy our learned policy.
Author Information
Hager Radi (University of Alberta)
A first year MSc student in computing science at the university of Alberta.
Josiah Hanna (University of Wisconsin -- Madison)
Peter Stone (The University of Texas at Austin, Sony AI)
Matthew Taylor (U. of Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Dates n/a. Room
More from the Same Authors
-
2020 : Paper 19: Multiagent Driving Policy for Congestion Reduction in a Large Scale Scenario »
Jiaxun Cui · Peter Stone -
2021 : Task-Independent Causal State Abstraction »
Zizhao Wang · Xuesu Xiao · Yuke Zhu · Peter Stone -
2021 : Leveraging Information about Background Music in Human-Robot Interaction »
Elad Liebman · Peter Stone -
2021 : Robust On-Policy Data Collection for Data-Efficient Policy Evaluation »
Rujie Zhong · Josiah Hanna · Lukas Schäfer · Stefano Albrecht -
2022 Poster: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 : BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Mao Ye · Bo Liu · Stephen Wright · Peter Stone · Qiang Liu -
2022 : ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning »
Eddy Hudson · Ishan Durugkar · Garrett Warnell · Peter Stone -
2022 : Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction »
Brahma Pavse · Josiah Hanna -
2022 : Fifteen-minute Competition Overview Video »
Tianpei Yang · Iuliia Kotseruba · Montgomery Alban · Amir Rasouli · Soheil Mohamad Alizadeh Shabestary · Randolph Goebel · Matthew Taylor · Liam Paull · Florian Shkurti -
2022 : ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning »
Eddy Hudson · Ishan Durugkar · Garrett Warnell · Peter Stone -
2022 : Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning »
Mhairi Dunion · Trevor McInroe · Kevin Sebastian Luck · Josiah Hanna · Stefano Albrecht -
2022 : Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning »
Chaitanya Kharyal · Tanmay Sinha · Vijaya Sai Krishna Gottipati · Srijita Das · Matthew Taylor -
2022 Workshop: Deep Reinforcement Learning Workshop »
Karol Hausman · Qi Zhang · Matthew Taylor · Martha White · Suraj Nair · Manan Tomar · Risto Vuorio · Ted Xiao · Zeyu Zheng · Manan Tomar -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Competition: Driving SMARTS »
Amir Rasouli · Matthew Taylor · Iuliia Kotseruba · Tianpei Yang · Randolph Goebel · Soheil Mohamad Alizadeh Shabestary · Montgomery Alban · Florian Shkurti · Liam Paull -
2022 : Panel RL Theory-Practice Gap »
Peter Stone · Matej Balog · Jonas Buchli · Jason Gauci · Dhruv Madeka -
2022 : Panel RL Benchmarks »
Minmin Chen · Pablo Samuel Castro · Caglar Gulcehre · Tony Jebara · Peter Stone -
2022 : Invited talk: Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning »
Peter Stone -
2022 Workshop: Reinforcement Learning for Real Life (RL4RealLife) Workshop »
Yuxi Li · Emma Brunskill · MINMIN CHEN · Omer Gottesman · Lihong Li · Yao Liu · Zhiwei Tony Qin · Matthew Taylor -
2022 : Human in the Loop Learning for Robot Navigation and Task Learning from Implicit Human Feedback »
Peter Stone -
2022 Poster: Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning »
Rujie Zhong · Duohan Zhang · Lukas Schäfer · Stefano Albrecht · Josiah Hanna -
2022 Poster: BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Bo Liu · Mao Ye · Stephen Wright · Peter Stone · Qiang Liu -
2022 Poster: Value Function Decomposition for Iterative Design of Reinforcement Learning Agents »
James MacGlashan · Evan Archer · Alisa Devlic · Takuma Seno · Craig Sherstan · Peter Wurman · Peter Stone -
2021 : Learning Representations for Pixel-based Control: What Matters and Why? »
Manan Tomar · Utkarsh A Mishra · Amy Zhang · Matthew Taylor -
2021 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · David Silver · Matthew Taylor · Martha White · Srijita Das · Yuqing Du · Andrew Patterson · Manan Tomar · Olivia Watkins -
2021 Poster: Adversarial Intrinsic Motivation for Reinforcement Learning »
Ishan Durugkar · Mauricio Tec · Scott Niekum · Peter Stone -
2021 Poster: Conflict-Averse Gradient Descent for Multi-task learning »
Bo Liu · Xingchao Liu · Xiaojie Jin · Peter Stone · Qiang Liu -
2021 Poster: Machine versus Human Attention in Deep Reinforcement Learning Tasks »
Sihang Guo · Ruohan Zhang · Bo Liu · Yifeng Zhu · Dana Ballard · Mary Hayhoe · Peter Stone -
2020 : Q&A: Peter Stone (The University of Texas at Austin): Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination, with Natasha Jaques (Google) [moderator] »
Peter Stone · Natasha Jaques -
2020 : Invited Speaker: Peter Stone (The University of Texas at Austin) on Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination »
Peter Stone -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Discussion Panel »
Pete Florence · Dorsa Sadigh · Carolina Parada · Jeannette Bohg · Roberto Calandra · Peter Stone · Fabio Ramos -
2020 : Invited talk: Peter Stone "Grounded Simulation Learning for Sim2Real with Connections to Off-Policy Reinforcement Learning" »
Peter Stone -
2020 : Contributed Talk: Maximum Reward Formulation In Reinforcement Learning »
Vijaya Sai Krishna Gottipati · Yashaswi Pathak · Rohan Nuttall · Sahir . · Raviteja Chunduru · Ahmed Touati · Sriram Ganapathi · Matthew Taylor · Sarath Chandar -
2020 Poster: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks »
Lemeng Wu · Bo Liu · Peter Stone · Qiang Liu -
2020 Poster: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch »
Siddharth Desai · Ishan Durugkar · Haresh Karnan · Garrett Warnell · Josiah Hanna · Peter Stone -
2018 : Peter Stone »
Peter Stone -
2018 : Control Algorithms for Imitation Learning from Observation »
Peter Stone -
2018 : Peter Stone »
Peter Stone -
2016 : Peter Stone (University of Texas at Austin) »
Peter Stone -
2015 Workshop: Learning, Inference and Control of Multi-Agent Systems »
Vicenç Gómez · Gerhard Neumann · Jonathan S Yedidia · Peter Stone