Timezone: »
Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly improve performance for long-horizon problems. In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling. We further generalize these special cases to a general covariance testing rule that can be used to decide which weights to drop in an IS estimate, and derive a new IS algorithm called Incremental Importance Sampling that can provide significantly more accurate estimates for a broad class of domains.
Author Information
Zhaohan Guo (Carnegie Mellon University/Stanford)
Philip S. Thomas (CMU)
Emma Brunskill (Stanford University)
More from the Same Authors
-
2021 : Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation »
Ramtin Keramati · Omer Gottesman · Leo Celi · Finale Doshi-Velez · Emma Brunskill -
2023 Poster: In-Context Decision-Making from Supervised Pretraining »
Jonathan N Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 Poster: Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan N Lee · Emma Brunskill -
2023 Poster: Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization »
Sanath Kumar Krishnamurthy · Ruohan Zhan · Susan Athey · Emma Brunskill -
2023 Poster: Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets »
Anirudhan Badrinath · Yannis Flet-Berliac · Allen Nie · Emma Brunskill -
2022 Workshop: Reinforcement Learning for Real Life (RL4RealLife) Workshop »
Yuxi Li · Emma Brunskill · MINMIN CHEN · Omer Gottesman · Lihong Li · Yao Liu · Zhiwei Tony Qin · Matthew Taylor -
2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
Jonathan N Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill -
2022 Poster: Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits »
Tong Mu · Yash Chandak · Tatsunori Hashimoto · Emma Brunskill -
2022 Poster: Off-Policy Evaluation for Action-Dependent Non-stationary Environments »
Yash Chandak · Shiv Shankar · Nathaniel Bastian · Bruno da Silva · Emma Brunskill · Philip Thomas -
2022 Poster: Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data »
Allen Nie · Yannis Flet-Berliac · Deon Jordan · William Steenbergen · Emma Brunskill -
2022 Poster: Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2021 : Retrospective Panel »
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal -
2021 : Safe RL Debate »
Sylvia Herbert · Animesh Garg · Emma Brunskill · Aleksandra Faust · Dylan Hadfield-Menell -
2021 Poster: Play to Grade: Testing Coding Games as Classifying Markov Decision Process »
Allen Nie · Emma Brunskill · Chris Piech -
2021 Poster: Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes »
HyunJi Alex Nam · Scott Fleming · Emma Brunskill -
2021 Poster: Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin J Wainwright · Emma Brunskill -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Design of Experiments for Stochastic Contextual Linear Bandits »
Andrea Zanette · Kefan Dong · Jonathan N Lee · Emma Brunskill -
2020 : Counterfactuals and Offline RL »
Emma Brunskill -
2020 : Q & A and Panel Session with Dan Weld, Kristen Grauman, Scott Yih, Emma Brunskill, and Alex Ratner »
Kristen Grauman · Wen-tau Yih · Alexander Ratner · Emma Brunskill · Douwe Kiela · Daniel S. Weld -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 : Keynote: Emma Brunskill »
Emma Brunskill -
2020 : Panel discussion on minimizing bias in machine learning in education »
Neil Heffernan · Osonde A. Osoba · Emma Brunskill · Kathi Fisler -
2020 Poster: Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding »
Hongseok Namkoong · Ramtin Keramati · Steve Yadlowsky · Emma Brunskill -
2020 Poster: Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2020 Poster: Provably Good Batch Reinforcement Learning Without Great Exploration »
Yao Liu · Adith Swaminathan · Alekh Agarwal · Emma Brunskill -
2019 : Emma Brünskill, "Some Theory RL Challenges Inspired by Education" »
Emma Brunskill -
2019 : Invited Talk »
Emma Brunskill -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Offline Contextual Bandits with High Probability Fairness Guarantees »
Blossom Metevier · Stephen Giguere · Sarah Brockman · Ari Kobren · Yuriy Brun · Emma Brunskill · Philip Thomas -
2019 Poster: Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model »
Andrea Zanette · Mykel J Kochenderfer · Emma Brunskill -
2019 Poster: Limiting Extrapolation in Linear Approximate Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2018 Poster: Representation Balancing MDPs for Off-policy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill -
2018 Demonstration: Automatic Curriculum Generation Applied to Teaching Novices a Short Bach Piano Segment »
Emma Brunskill · Tong Mu · Karan Goel · Jonathan Bragg -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Sample efficiency and off policy hierarchical RL (Emma Brunskill) »
Emma Brunskill -
2017 : Emma Brunskill (Stanford) »
Emma Brunskill -
2017 : Invited Talk »
Emma Brunskill -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Tutorial: Reinforcement Learning with People »
Emma Brunskill