Timezone: »
In offline RL, constraining the learned policy to remain close to the data is essential to prevent the policy from outputting out-of-distribution (OOD) actions with erroneously overestimated values. In principle, generative adversarial networks (GAN) can provide an elegant solution to do so, with the discriminator directly providing a probability that quantifies distributional shift. However, in practice, GAN-based offline RL methods have not outperformed alternative approaches, perhaps because the generator is trained to both fool the discriminator and maximize return - two objectives that are often at odds with each other. In this paper, we show that the issue of conflicting objectives can be resolved by training two generators: one that maximizes return, with the other capturing the "remainder" of the data distribution in the offline dataset, such that the mixture of the two is close to the behavior policy. We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint, where the policy does not need to match the entire data distribution, but only the slice of the data that leads to high long term performance. We name our method DASCO, for Dual-Generator Adversarial Support Constrained Offline RL. On benchmark tasks that require learning from sub-optimal data, DASCO significantly outperforms prior methods that enforce distribution constraint.
Author Information
Quan Vuong (University of California San Diego)
Aviral Kumar (UC Berkeley)
Sergey Levine (UC Berkeley)
Yevgen Chebotar (Google)
More from the Same Authors
-
2021 Spotlight: Robust Predictable Control »
Ben Eysenbach · Russ Salakhutdinov · Sergey Levine -
2021 Spotlight: Offline Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 Spotlight: Pragmatic Image Compression for Human-in-the-Loop Decision-Making »
Sid Reddy · Anca Dragan · Sergey Levine -
2021 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa · Pang Wei Koh · Tony Lee · Irena Gao · Sang Michael Xie · Kendrick Shen · Ananya Kumar · Weihua Hu · Michihiro Yasunaga · Henrik Marklund · Sara Beery · Ian Stavness · Jure Leskovec · Kate Saenko · Tatsunori Hashimoto · Sergey Levine · Chelsea Finn · Percy Liang -
2021 : Test Time Robustification of Deep Models via Adaptation and Augmentation »
Marvin Zhang · Sergey Levine · Chelsea Finn -
2021 : Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning »
Dhruv Shah · Ted Xiao · Alexander Toshev · Sergey Levine · brian ichter -
2021 : Data Sharing without Rewards in Multi-Task Offline Reinforcement Learning »
Tianhe Yu · Aviral Kumar · Yevgen Chebotar · Chelsea Finn · Sergey Levine · Karol Hausman -
2021 : Should I Run Offline Reinforcement Learning or Behavioral Cloning? »
Aviral Kumar · Joey Hong · Anikait Singh · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : Offline Reinforcement Learning with In-sample Q-Learning »
Ilya Kostrikov · Ashvin Nair · Sergey Levine -
2021 : C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks »
Tianjun Zhang · Ben Eysenbach · Russ Salakhutdinov · Sergey Levine · Joseph Gonzalez -
2021 : The Information Geometry of Unsupervised Reinforcement Learning »
Ben Eysenbach · Russ Salakhutdinov · Sergey Levine -
2021 : Mismatched No More: Joint Model-Policy Optimization for Model-Based RL »
Ben Eysenbach · Alexander Khazatsky · Sergey Levine · Russ Salakhutdinov -
2021 : Offline Meta-Reinforcement Learning with Online Self-Supervision »
Vitchyr Pong · Ashvin Nair · Laura Smith · Catherine Huang · Sergey Levine -
2021 : Hybrid Imitative Planning with Geometric and Predictive Costs in Offroad Environments »
Daniel Shin · Dhruv Shah · Ali Agha · Nicholas Rhinehart · Sergey Levine -
2021 : CoMPS: Continual Meta Policy Search »
Glen Berseth · Zhiwei Zhang · Grace Zhang · Chelsea Finn · Sergey Levine -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes »
Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine -
2022 : Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning »
Aviral Kumar · Anikait Singh · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine -
2022 : Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2022 : Skill Acquisition by Instruction Augmentation on Offline Datasets »
Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson -
2022 : Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts »
Amrith Setlur · Don Dennis · Benjamin Eysenbach · Aditi Raghunathan · Chelsea Finn · Virginia Smith · Sergey Levine -
2022 : Confidence-Conditioned Value Functions for Offline Reinforcement Learning »
Joey Hong · Aviral Kumar · Sergey Levine -
2022 : Efficient Deep Reinforcement Learning Requires Regulating Statistical Overfitting »
Qiyang Li · Aviral Kumar · Ilya Kostrikov · Sergey Levine -
2022 : Contrastive Example-Based Control »
Kyle Hatch · Sarthak J Shetty · Benjamin Eysenbach · Tianhe Yu · Rafael Rafailov · Russ Salakhutdinov · Sergey Levine · Chelsea Finn -
2022 : Offline Reinforcement Learning for Customizable Visual Navigation »
Dhruv Shah · Arjun Bhorkar · Hrishit Leen · Ilya Kostrikov · Nicholas Rhinehart · Sergey Levine -
2022 : A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Sergey Levine · Russ Salakhutdinov -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : Confidence-Conditioned Value Functions for Offline Reinforcement Learning »
Joey Hong · Aviral Kumar · Sergey Levine -
2022 : Efficient Deep Reinforcement Learning Requires Regulating Statistical Overfitting »
Qiyang Li · Aviral Kumar · Ilya Kostrikov · Sergey Levine -
2022 : Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning »
Anikait Singh · Aviral Kumar · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine -
2022 : Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2022 : Adversarial Policies Beat Professional-Level Go AIs »
Tony Wang · Adam Gleave · Nora Belrose · Tom Tseng · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Joseph Miller · Sergey Levine · Stuart J Russell -
2022 : Contrastive Example-Based Control »
Kyle Hatch · Sarthak J Shetty · Benjamin Eysenbach · Tianhe Yu · Rafael Rafailov · Russ Salakhutdinov · Sergey Levine · Chelsea Finn -
2022 : PnP-Nav: Plug-and-Play Policies for Generalizable Visual Navigation Across Robots »
Dhruv Shah · Ajay Sridhar · Arjun Bhorkar · Noriaki Hirose · Sergey Levine -
2022 : Offline Reinforcement Learning for Customizable Visual Navigation »
Dhruv Shah · Arjun Bhorkar · Hrishit Leen · Ilya Kostrikov · Nicholas Rhinehart · Sergey Levine -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 : A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning »
Benjamin Eysenbach · Matthieu Geist · Russ Salakhutdinov · Sergey Levine -
2022 : Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective »
Raj Ghugare · Homanga Bharadhwaj · Benjamin Eysenbach · Sergey Levine · Ruslan Salakhutdinov -
2022 : Ilya Kostrikov, Aviral Kumar »
Ilya Kostrikov · Aviral Kumar -
2022 : Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes »
Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine -
2022 : Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement »
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang -
2022 Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad" »
Aviral Kumar · Rishabh Agarwal · Aravind Rajeswaran · Wenxuan Zhou · George Tucker · Doina Precup · Aviral Kumar -
2022 Poster: MEMO: Test Time Robustness via Adaptation and Augmentation »
Marvin Zhang · Sergey Levine · Chelsea Finn -
2022 Poster: First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization »
Siddharth Reddy · Sergey Levine · Anca Dragan -
2022 Poster: Adversarial Unlearning: Reducing Confidence Along Adversarial Directions »
Amrith Setlur · Benjamin Eysenbach · Virginia Smith · Sergey Levine -
2022 Poster: Mismatched No More: Joint Model-Policy Optimization for Model-Based RL »
Benjamin Eysenbach · Alexander Khazatsky · Sergey Levine · Russ Salakhutdinov -
2022 Poster: Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity »
Abhishek Gupta · Aldo Pacchiano · Yuexiang Zhai · Sham Kakade · Sergey Levine -
2022 Poster: Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Abhishek Gupta · Dibya Ghosh · Sergey Levine · Pulkit Agrawal -
2022 Poster: You Only Live Once: Single-Life Reinforcement Learning »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 Poster: Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation »
Michael Chang · Tom Griffiths · Sergey Levine -
2022 Poster: Data-Driven Offline Decision-Making via Invariant Representation Learning »
Han Qi · Yi Su · Aviral Kumar · Sergey Levine -
2022 Poster: Contrastive Learning as Goal-Conditioned Reinforcement Learning »
Benjamin Eysenbach · Tianjun Zhang · Sergey Levine · Russ Salakhutdinov -
2022 Poster: Imitating Past Successes can be Very Suboptimal »
Benjamin Eysenbach · Soumith Udatha · Russ Salakhutdinov · Sergey Levine -
2021 Workshop: Offline Reinforcement Learning »
Rishabh Agarwal · Aviral Kumar · George Tucker · Justin Fu · Nan Jiang · Doina Precup · Aviral Kumar -
2021 : Data-Driven Offline Optimization for Architecting Hardware Accelerators »
Aviral Kumar · Amir Yazdanbakhsh · Milad Hashemi · Kevin Swersky · Sergey Levine -
2021 : Offline Meta-Reinforcement Learning with Online Self-Supervision Q&A »
Vitchyr Pong · Ashvin Nair · Laura Smith · Catherine Huang · Sergey Levine -
2021 : Offline Meta-Reinforcement Learning with Online Self-Supervision »
Vitchyr Pong · Ashvin Nair · Laura Smith · Catherine Huang · Sergey Levine -
2021 : Offline Meta-Reinforcement Learning with Online Self-Supervision »
Vitchyr Pong · Ashvin Nair · Laura Smith · Catherine Huang · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Q&A »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 Oral: Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification »
Ben Eysenbach · Sergey Levine · Russ Salakhutdinov -
2021 Poster: Robust Predictable Control »
Ben Eysenbach · Russ Salakhutdinov · Sergey Levine -
2021 Poster: Which Mutual-Information Representation Learning Objectives are Sufficient for Control? »
Kate Rakelly · Abhishek Gupta · Carlos Florensa · Sergey Levine -
2021 Poster: COMBO: Conservative Offline Model-Based Policy Optimization »
Tianhe Yu · Aviral Kumar · Rafael Rafailov · Aravind Rajeswaran · Sergey Levine · Chelsea Finn -
2021 Poster: Outcome-Driven Reinforcement Learning via Variational Inference »
Tim G. J. Rudner · Vitchyr Pong · Rowan McAllister · Yarin Gal · Sergey Levine -
2021 Poster: Bayesian Adaptation for Covariate Shift »
Aurick Zhou · Sergey Levine -
2021 Poster: Offline Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 Poster: Pragmatic Image Compression for Human-in-the-Loop Decision-Making »
Sid Reddy · Anca Dragan · Sergey Levine -
2021 Poster: Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification »
Ben Eysenbach · Sergey Levine · Russ Salakhutdinov -
2021 Poster: Information is Power: Intrinsic Control via Information Capture »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 Poster: Conservative Data Sharing for Multi-Task Offline Reinforcement Learning »
Tianhe Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 Poster: Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability »
Dibya Ghosh · Jad Rahme · Aviral Kumar · Amy Zhang · Ryan Adams · Sergey Levine -
2021 Poster: Autonomous Reinforcement Learning via Subgoal Curricula »
Archit Sharma · Abhishek Gupta · Sergey Levine · Karol Hausman · Chelsea Finn -
2021 Poster: Adaptive Risk Minimization: Learning to Adapt to Domain Shift »
Marvin Zhang · Henrik Marklund · Nikita Dhawan · Abhishek Gupta · Sergey Levine · Chelsea Finn -
2020 Workshop: Offline Reinforcement Learning »
Aviral Kumar · Rishabh Agarwal · George Tucker · Lihong Li · Doina Precup · Aviral Kumar -
2020 Poster: Model Inversion Networks for Model-Based Optimization »
Aviral Kumar · Sergey Levine -
2020 Poster: Multi-task Batch Reinforcement Learning with Metric Learning »
Jiachen Li · Quan Vuong · Shuang Liu · Minghua Liu · Kamil Ciosek · Henrik Christensen · Hao Su -
2020 Poster: Conservative Q-Learning for Offline Reinforcement Learning »
Aviral Kumar · Aurick Zhou · George Tucker · Sergey Levine -
2020 Tutorial: (Track3) Offline Reinforcement Learning: From Algorithm Design to Practical Applications Q&A »
Sergey Levine · Aviral Kumar -
2020 Poster: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar · Abhishek Gupta · Sergey Levine -
2020 Spotlight: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar · Abhishek Gupta · Sergey Levine -
2019 Poster: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction »
Aviral Kumar · Justin Fu · George Tucker · Sergey Levine -
2019 Poster: Better Exploration with Optimistic Actor Critic »
Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann -
2019 Spotlight: Better Exploration with Optimistic Actor Critic »
Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang