Timezone: »

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin · Hoang Le · Nan Jiang · Yisong Yue
Event URL: https://openreview.net/forum?id=IsK8iKbL-I »

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.

Author Information

Cameron Voloshin (California Institute of Technology)
Hoang Le (Microsoft Research)
Nan Jiang (University of Illinois at Urbana-Champaign)
Yisong Yue (Caltech)

More from the Same Authors