Timezone: »
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending the existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.
Author Information
Nevena Lazic (DeepMind)
Dong Yin (DeepMind)
Mehrdad Farajtabar (DeepMind)
Nir Levine (DeepMind)
Dilan Gorur (DeepMind)
Chris Harris (Google)
Dale Schuurmans (Google Brain & University of Alberta)
More from the Same Authors
-
2021 : One Pass ImageNet »
Clara Huiyi Hu · Ang Li · Daniele Calandriello · Dilan Gorur -
2021 : Importance of Representation Learning for Off-Policy Fitted Q-Evaluation »
Xian Wu · Nevena Lazic · Dong Yin · Cosmin Paduraru -
2020 Poster: Learning to Incentivize Other Learning Agents »
Jiachen Yang · Ang Li · Mehrdad Farajtabar · Peter Sunehag · Edward Hughes · Hongyuan Zha -
2020 Poster: Understanding the Role of Training Regimes in Continual Learning »
Seyed Iman Mirzadeh · Mehrdad Farajtabar · Razvan Pascanu · Hassan Ghasemzadeh -
2020 Poster: Self-Distillation Amplifies Regularization in Hilbert Space »
Hossein Mobahi · Mehrdad Farajtabar · Peter Bartlett -
2020 Poster: Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration »
Hanjun Dai · Rishabh Singh · Bo Dai · Charles Sutton · Dale Schuurmans -
2020 Poster: An Efficient Framework for Clustered Federated Learning »
Avishek Ghosh · Jichan Chung · Dong Yin · Kannan Ramchandran -
2020 Poster: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Oral: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Off-Policy Evaluation via the Regularized Lagrangian »
Mengjiao (Sherry) Yang · Ofir Nachum · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 : Closing Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Audrey Durand, Douwe Kiela, Kamalika Chaudhuri moderated by Yann Dauphin »
Audrey Durand · Kamalika Chaudhuri · Yann Dauphin · Orhan Firat · Dilan Gorur · Douwe Kiela -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Workshop: The Optimization Foundations of Reinforcement Learning »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Opening Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 Poster: Surrogate Objectives for Batch Policy Optimization in One-step Decision Making »
Minmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans -
2019 Poster: Exponential Family Estimation via Adversarial Dynamics Embedding »
Bo Dai · Zhen Liu · Hanjun Dai · Niao He · Arthur Gretton · Le Song · Dale Schuurmans -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Off-Policy Evaluation via Off-Policy Classification »
Alexander Irpan · Kanishka Rao · Konstantinos Bousmalis · Chris Harris · Julian Ibarz · Sergey Levine -
2018 Poster: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Oral: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: Data center cooling using model-predictive control »
Nevena Lazic · Craig Boutilier · Tyler Lu · Eehern Wong · Binz Roy · Moonkyung Ryu · Greg Imwalle -
2014 Workshop: Personalization: Methods and Applications »
Yisong Yue · Khalid El-Arini · Dilan Gorur -
2013 Workshop: What Difference Does Personalization Make? »
Dilan Gorur · Romer Rosales · Olivier Chapelle · Dorota Glowacka -
2009 Workshop: Nonparametric Bayes »
Dilan Gorur · Francois Caron · Yee Whye Teh · David B Dunson · Zoubin Ghahramani · Michael Jordan -
2009 Poster: Indian Buffet Processes with Power-law Behavior »
Yee Whye Teh · Dilan Gorur -
2009 Spotlight: Indian Buffet Processes with Power-law Behavior »
Yee Whye Teh · Dilan Gorur -
2008 Poster: Dependent Dirichlet Process Spike Sorting »
Jan Gasthaus · Frank Wood · Dilan Gorur · Yee Whye Teh -
2008 Poster: An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering »
Dilan Gorur · Yee Whye Teh