`

Timezone: »

 
Why so pessimistic? Estimating uncertainties for offline rl through ensembles, and why their independence matters
Seyed Kamyar Seyed Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum

In offline/batch reinforcement learning (RL), the predominant class of approaches with most success have been ``support constraint" methods, where trained policies are encouraged to remain within the support of the provided offline dataset. However, support constraints correspond to an overly pessimistic assumption that actions outside the provided data may lead to worst-case outcomes. In this work, we aim to relax this assumption by obtaining uncertainty estimates for predicted action values, and acting conservatively with respect to a lower-confidence bound (LCB) on these estimates. Motivated by the success of ensembles for uncertainty estimation in supervised learning, we propose MSG, an offline RL method that employs an ensemble of independently updated Q-functions. First, theoretically, by referring to the literature on infinite-width neural networks, we demonstrate the crucial dependence of the quality of derived uncertainties on the manner in which ensembling is performed, a phenomenon that arises due to the dynamic programming nature of RL and overlooked by existing offline RL methods. Our theoretical predictions are corroborated by pedagogical examples on toy MDPs, as well as empirical comparisons in benchmark continuous control domains. In the significantly more challenging antmaze domains of the D4RL benchmark, MSG with deep ensembles by a wide margin surpasses highly well-tuned state-of-the-art methods. Consequently, we investigate whether efficient approximations can be similarly effective. We demonstrate that while some very efficient variants also outperform current state-of-the-art, they do not match the performance and robustness of MSG with deep ensembles. We hope that the significant impact of our less pessimistic approach engenders increased focus into uncertainty estimation techniques directed at RL, and engenders new efforts from the community of deep network uncertainty estimation researchers.

Author Information

Seyed Kamyar Seyed Ghasemipour (University of Toronto, Vector Institute)
Shixiang (Shane) Gu (Google Brain)
Ofir Nachum (Google Brain)

More from the Same Authors

  • 2021 : Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions »
    Bogdan Mazoure · Ilya Kostrikov · Ofir Nachum · Jonathan Tompson
  • 2021 : TRAIL: Near-Optimal Imitation Learning with Suboptimal Data »
    Mengjiao Yang · Sergey Levine · Ofir Nachum
  • 2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
    Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine
  • 2020 Poster: Weakly-Supervised Reinforcement Learning for Controllable Behavior »
    Lisa Lee · Benjamin Eysenbach · Russ Salakhutdinov · Shixiang (Shane) Gu · Chelsea Finn
  • 2019 : Poster and Coffee Break 2 »
    Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall
  • 2019 : Poster Session »
    Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn
  • 2019 : Poster Spotlight 2 »
    Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare
  • 2019 : Contributed Talks »
    Kevin Lu · Matthew Hausknecht · Ofir Nachum
  • 2019 : Poster Session »
    Rishav Chourasia · Yichong Xu · Corinna Cortes · Chien-Yi Chang · Yoshihiro Nagano · So Yeon Min · Benedikt Boecking · Phi Vu Tran · Seyed Kamyar Seyed Ghasemipour · Qianggang Ding · Shouvik Mani · Vikram Voleti · Rasool Fakoor · Miao Xu · Kenneth Marino · Lisa Lee · Volker Tresp · Jean-Francois Kagy · Marvin Zhang · Barnabas Poczos · Dinesh Khandelwal · Adrien Bardes · Evan Shelhamer · Jiacheng Zhu · Ziming Li · Xiaoyan Li · Dmitrii Krasheninnikov · Ruohan Wang · Mayoore Jaiswal · Emad Barsoum · Suvansh Sanjeev · Theeraphol Wattanavekin · Qizhe Xie · Sifan Wu · Yuki Yoshida · David Kanaa · Sina Khoshfetrat Pakazad · Mehdi Maasoumy
  • 2019 Poster: SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies »
    Seyed Kamyar Seyed Ghasemipour · Shixiang (Shane) Gu · Richard Zemel
  • 2019 Poster: Language as an Abstraction for Hierarchical Deep Reinforcement Learning »
    YiDing Jiang · Shixiang (Shane) Gu · Kevin Murphy · Chelsea Finn
  • 2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
    Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans