Timezone: »
The goal of offline policy evaluation (OPE) is to evaluate target policies based on logged data with a possibly much different distribution. One of the most popular empirical approaches to OPE is fitted Q-evaluation (FQE). With linear function approximation, several works have found that FQE (and other OPE methods) exhibit exponential error amplification in the problem horizon, except under very strong assumptions. Given the empirical success of deep FQE, in this work we examine the effect of implicit regularization through deep architectures and loss functions on the divergence and performance of FQE. We find that divergence does occur with simple feed-forward architectures, but can be mitigated using various architectures and algorithmic techniques, such as ResNet architectures, learning a shared representation between multiple target policies, and hypermodels. Our results suggest interesting directions for future work, including analyzing the effect of architecture on stability of fixed-point updates which are ubiquitous in modern reinforcement learning.
Author Information
Xian Wu (University of California, Berkeley)
Nevena Lazic (DeepMind)
Dong Yin (DeepMind)
Cosmin Paduraru (DeepMind)
More from the Same Authors
-
2022 : Controlling Commercial Cooling Systems Using Reinforcement Learning »
Jerry Luo · Cosmin Paduraru · Octavian Voicu · Yuri Chervonyi · Scott Munns · Jerry Li · Crystal Qian · Praneet Dutta · Daniel Mankowitz · Jared Quincy Davis · Ningjia Wu · Xingwei Yang · Chu-Ming Chang · Ted Li · Rob Rose · Mingyan Fan · Hootan Nakhost · Tinglin Liu · Deeni Fatiha · Neil Satra · Juliet Rothenberg · Molly Carlin · Satish Tallapaka · Sims Witherspoon · David Parish · Peter Dolan · Chenyu Zhao -
2022 : Controlling Commercial Cooling Systems Using Reinforcement Learning »
Jerry Luo · Cosmin Paduraru · Octavian Voicu · Yuri Chervonyi · Scott Munns · Jerry Li · Crystal Qian · Praneet Dutta · Daniel Mankowitz · Jared Quincy Davis · Ningjia Wu · Xingwei Yang · Chu-Ming Chang · Ted Li · Rob Rose · Mingyan Fan · Hootan Nakhost · Tinglin Liu · Deeni Fatiha · Neil Satra · Juliet Rothenberg · Molly Carlin · Satish Tallapaka · Sims Witherspoon · David Parish · Peter Dolan · Chenyu Zhao -
2022 : Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning »
William Wong · Praneet Dutta · Octavian Voicu · Yuri Chervonyi · Cosmin Paduraru · Jerry Luo -
2022 : Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning »
William Wong · Praneet Dutta · Octavian Voicu · Yuri Chervonyi · Cosmin Paduraru · Jerry Luo -
2022 : Offline evaluation in RL: soft stability weighting to combine fitted Q-learning and model-based methods »
Briton Park · Xian Wu · Bin Yu · Angela Zhou -
2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas -
2020 Poster: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs »
Nevena Lazic · Dong Yin · Mehrdad Farajtabar · Nir Levine · Dilan Gorur · Chris Harris · Dale Schuurmans -
2020 Poster: An Efficient Framework for Clustered Federated Learning »
Avishek Ghosh · Jichan Chung · Dong Yin · Kannan Ramchandran -
2018 Poster: Data center cooling using model-predictive control »
Nevena Lazic · Craig Boutilier · Tyler Lu · Eehern Wong · Binz Roy · Moonkyung Ryu · Greg Imwalle