Timezone: »
We consider the evaluation and training of a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OPE and OPL assume the same distribution of covariate between the historical and evaluation data, there often exists a problem of a covariate shift,i.e., the distribution of the covariate of the historical data is different from that of the evaluation data. In this paper, we derive the efficiency bound of OPE under a covariate shift. Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using an estimator of the density ratio between the distributions of the historical and evaluation data. We also discuss other possible estimators and compare their theoretical properties. Finally, we confirm the effectiveness of the proposed estimators through experiments.
Author Information
Masatoshi Uehara (Cornell University)
Masahiro Kato (Cyberagent.Inc)
Shota Yasui (Cyberagent)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Wed. Dec 9th 05:00 -- 07:00 PM Room Poster Session 3 #789
More from the Same Authors
-
2021 : Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization »
Masahiro Kato · Kei Nakagawa · Kenshi Abe · Tetsuro Morimura -
2021 : Learning Causal Relationships from Conditional Moment Restrictions by Importance Weighting »
Shota Yasui -
2021 : Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage »
Masatoshi Uehara · Wen Sun -
2022 Poster: Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2021 : Representation Learning for Online and Offline RL in Low-rank MDPs »
Masatoshi Uehara · Xuezhou Zhang · Wen Sun -
2021 Workshop: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Xinkun Nie · Masatoshi Uehara · Kelly Zhang -
2021 : Representation Learning for Online and Offline RL in Low-rank MDPs »
Masatoshi Uehara · Xuezhou Zhang · Wen Sun -
2021 Poster: Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 Poster: The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy »
Masahiro Kato · Kenichiro McAlinn · Shota Yasui -
2020 Poster: Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies »
Nathan Kallus · Masatoshi Uehara