Timezone: »
Offline reinforcement learning (RL) enables learning a decision-making policy without interaction with the environment. This makes it particularly beneficial in situations where such interactions are costly. However, a known challenge for offline RL algorithms is the distributional mismatch between the state-action distributions of the learned policy and the dataset, which can significantly impact performance. State-of-the-art algorithms address it by constraining the policy to align with the state-action pairs in the dataset. However, this strategy struggles on datasets that predominantly consist of trajectories collected by low-performing policies and only a few trajectories from high-performing ones. Indeed, the constraint to align with the data leads the policy to imitate low-performing behaviors predominating the dataset. Our key insight to address this issue is to constrain the policy to the policy that collected the good parts of the dataset rather than all data. To this end, we optimize the importance sampling weights to emulate sampling data from a data distribution generated by a nearly optimal policy. Our method exhibits considerable performance gains (up to five times better) over the existing approaches in state-of-the-art offline RL algorithms over 72 imbalanced datasets with varying types of imbalance.
Author Information
Zhang-Wei Hong (Massachusetts Institute of Technology)
Aviral Kumar (UC Berkeley)
Sathwik Karnik (Computer Science and Artificial Intelligence Laboratory, Electrical Engineering & Computer Science)
Abhishek Bhandwaldar (IBM)
Akash Srivastava (MIT IBM)
Joni Pajarinen (Aalto University)
Romain Laroche (Microsoft Research)
Abhishek Gupta (University of Washington)
Pulkit Agrawal (MIT)
More from the Same Authors
-
2021 : ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation »
Chuang Gan · Jeremy Schwartz · Seth Alter · Damian Mrowca · Martin Schrimpf · James Traer · Julian De Freitas · Jonas Kubilius · Abhishek Bhandwaldar · Nick Haber · Megumi Sano · Kuno Kim · Elias Wang · Michael Lingelbach · Aidan Curtis · Kevin Feigelis · Daniel Bear · Dan Gutfreund · David Cox · Antonio Torralba · James J DiCarlo · Josh Tenenbaum · Josh McDermott · Dan Yamins -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 : Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning »
Yi Zhao · Rinu Boney · Alexander Ilin · Juho Kannala · Joni Pajarinen -
2022 Poster: Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning »
Riashat Islam · Hongyu Zang · Anirudh Goyal · Alex Lamb · Kenji Kawaguchi · Xin Li · Romain Laroche · Yoshua Bengio · Remi Tachet des Combes -
2022 : Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes »
Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine -
2022 : Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning »
Aviral Kumar · Anikait Singh · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine -
2022 : Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2022 : Is Conditional Generative Modeling all you need for Decision-Making? »
Anurag Ajay · Yilun Du · Abhi Gupta · Josh Tenenbaum · Tommi Jaakkola · Pulkit Agrawal -
2022 : Learning to Extrapolate: A Transductive Approach »
Aviv Netanyahu · Abhishek Gupta · Max Simchowitz · Kaiqing Zhang · Pulkit Agrawal -
2022 : Confidence-Conditioned Value Functions for Offline Reinforcement Learning »
Joey Hong · Aviral Kumar · Sergey Levine -
2022 : Efficient Deep Reinforcement Learning Requires Regulating Statistical Overfitting »
Qiyang Li · Aviral Kumar · Ilya Kostrikov · Sergey Levine -
2022 : Confidence-Conditioned Value Functions for Offline Reinforcement Learning »
Joey Hong · Aviral Kumar · Sergey Levine -
2022 : Efficient Deep Reinforcement Learning Requires Regulating Statistical Overfitting »
Qiyang Li · Aviral Kumar · Ilya Kostrikov · Sergey Levine -
2022 : Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning »
Anikait Singh · Aviral Kumar · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine -
2022 : Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2022 : Constrained Imitation Q-learning with Earth Mover’s Distance reward »
WENYAN Yang · Nataliya Strokina · Joni Pajarinen · Joni-kristian Kamarainen -
2022 : Fast Adaptation via Human Diagnosis of Task Distribution Shift »
Andi Peng · Mark Ho · Aviv Netanyahu · Julie A Shah · Pulkit Agrawal -
2022 : Aligning Robot Representations with Humans »
Andreea Bobu · Andi Peng · Pulkit Agrawal · Julie A Shah · Anca Dragan -
2023 : Learning from Invalid Data: On Constraint Satisfaction in Generative Models »
Giorgio Giannone · Lyle Regenwetter · Akash Srivastava · Dan Gutfreund · Faez Ahmed -
2023 : Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation »
Giorgio Giannone · Akash Srivastava · Ole Winther · Faez Ahmed -
2023 : Robotic Offline RL from Internet Videos via Value-Function Pre-Training »
Chethan Bhateja · Derek Guo · Dibya Ghosh · Anikait Singh · Manan Tomar · Quan Vuong · Yevgen Chebotar · Sergey Levine · Aviral Kumar -
2023 : Vision-Language Models Provide Promptable Representations for Reinforcement Learning »
William Chen · Oier Mees · Aviral Kumar · Sergey Levine -
2023 : Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models »
Kevin Black · Mitsuhiko Nakamoto · Pranav Atreya · Homer Walke · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 : Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity »
Jaedong Hwang · Zhang-Wei Hong · Eric Chen · Akhilan Boopathy · Pulkit Agrawal · Ila Fiete -
2023 : Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction »
Han Qi · Stefano Rando · XINYANG GENG · Iku Ohama · Aviral Kumar · Sergey Levine -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models »
Kevin Black · Mitsuhiko Nakamoto · Pranav Atreya · Homer Walke · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 : Modeling Boundedly Rational Agents with Latent Inference Budgets »
Athul Jacob · Abhishek Gupta · Jacob Andreas -
2023 : Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models »
Kevin Black · Mitsuhiko Nakamoto · Pranav Atreya · Homer Walke · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 : Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning »
Zhaoyi Zhou · Chuning Zhu · Runlong Zhou · Qiwen Cui · Abhishek Gupta · Simon Du -
2023 : Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning »
Zhaoyi Zhou · Chuning Zhu · Runlong Zhou · Qiwen Cui · Abhishek Gupta · Simon Du -
2023 : Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 : Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 : Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models »
Kevin Black · Mitsuhiko Nakamoto · Pranav Atreya · Homer Walke · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 : Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models »
Kevin Black · Mitsuhiko Nakamoto · Pranav Atreya · Homer Walke · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 : Semantically-Driven Object Search Using Partially Observed 3D Scene Graphs »
Isaac Remy · Abhishek Gupta · Karen Leung -
2023 : Semantically-Driven Object Search Using Partially Observed 3D Scene Graphs »
Isaac Remy · Abhishek Gupta · Karen Leung -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Robotic Offline RL from Internet Videos via Value-Function Pre-Training »
Chethan Bhateja · Derek Guo · Dibya Ghosh · Anikait Singh · Manan Tomar · Quan Vuong · Yevgen Chebotar · Sergey Levine · Aviral Kumar -
2023 : Robotic Offline RL from Internet Videos via Value-Function Pre-Training »
Chethan Bhateja · Derek Guo · Dibya Ghosh · Anikait Singh · Manan Tomar · Quan Vuong · Yevgen Chebotar · Sergey Levine · Aviral Kumar -
2023 : Scaling Offline Q-Learning with Vision Transformers »
Yingjie Miao · Jordi Orbay · Rishabh Agarwal · Aviral Kumar · George Tucker · Aleksandra Faust -
2023 : Scaling Offline Q-Learning with Vision Transformers »
Yingjie Miao · Jordi Orbay · Rishabh Agarwal · Aviral Kumar · George Tucker · Aleksandra Faust -
2023 : Vision-Language Models Provide Promptable Representations for Reinforcement Learning »
William Chen · Oier Mees · Aviral Kumar · Sergey Levine -
2023 : Vision-Language Models Provide Promptable Representations for Reinforcement Learning »
William Chen · Oier Mees · Aviral Kumar · Sergey Levine -
2023 Poster: Self-Supervised Reinforcement Learning that Transfers using Random Features »
Boyuan Chen · Chuning Zhu · Pulkit Agrawal · Kaiqing Zhang · Abhishek Gupta -
2023 Poster: Breadcrumbs to the Goal: Supervised Goal Selection from Human-in-the-Loop Feedback »
Marcel Torne Villasevil · Max Balsells I Pamies · Zihan Wang · Samedh Desai · Tao Chen · Pulkit Agrawal · Abhishek Gupta -
2023 Poster: ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints »
Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine -
2023 Poster: Human-Guided Complexity-Controlled Abstractions »
Andi Peng · Mycal Tucker · Eoin Kenny · Noga Zaslavsky · Pulkit Agrawal · Julie A Shah -
2023 Poster: Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning »
Hongyu Zang · Xin Li · Leiji Zhang · Yang Liu · Baigui Sun · Riashat Islam · Remi Tachet des Combes · Romain Laroche -
2023 Poster: RoboHive: A Unified Framework for Robot Learning »
Vikash Kumar · Rutav Shah · Gaoyue Zhou · Vincent Moens · Vittorio Caggiano · Abhishek Gupta · Aravind Rajeswaran -
2023 Poster: Identifiability Guarantees for Causal Disentanglement from Soft Interventions »
Jiaqi Zhang · Kristjan Greenewald · Chandler Squires · Akash Srivastava · Karthikeyan Shanmugam · Caroline Uhler -
2023 Poster: Post-processing Private Synthetic Data for Improving Utility on Selected Measures »
Hao Wang · Shivchander Sudalairaj · John Henning · Kristjan Greenewald · Akash Srivastava -
2023 Poster: Towards robust and generalizable representations of extracellular data using contrastive learning »
Ankit Vishnubhotla · Charlotte Loh · Akash Srivastava · Liam Paninski · Cole Hurwitz -
2023 Poster: Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 Poster: Hybrid Search for Efficient Planning with Completeness Guarantees »
Kalle Kujanpää · Joni Pajarinen · Alexander Ilin -
2023 Poster: RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability »
Chuning Zhu · Max Simchowitz · Siri Gadipudi · Abhishek Gupta -
2023 Poster: Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation »
Giorgio Giannone · Akash Srivastava · Ole Winther · Faez Ahmed -
2023 Poster: Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning »
Mitsuhiko Nakamoto · Simon Zhai · Anikait Singh · Max Sobol Mark · Yi Ma · Chelsea Finn · Aviral Kumar · Sergey Levine -
2023 Poster: Analyzing Generalization of Neural Networks through Loss Path Kernels »
Yilan Chen · Wei Huang · Hao Wang · Charlotte Loh · Akash Srivastava · Lam Nguyen · Lily Weng -
2022 : Ilya Kostrikov, Aviral Kumar »
Ilya Kostrikov · Aviral Kumar -
2022 : Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes »
Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine -
2022 : Visual Pre-training for Navigation: What Can We Learn from Noise? »
Felix Yanwei Wang · Ching-Yun Ko · Pulkit Agrawal -
2022 Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad" »
Aviral Kumar · Rishabh Agarwal · Aravind Rajeswaran · Wenxuan Zhou · George Tucker · Doina Precup · Aviral Kumar -
2022 Poster: Redeeming intrinsic rewards via constrained optimization »
Eric Chen · Zhang-Wei Hong · Joni Pajarinen · Pulkit Agrawal -
2022 Poster: When does return-conditioned supervised learning work for offline reinforcement learning? »
David Brandfonbrener · Alberto Bietti · Jacob Buckman · Romain Laroche · Joan Bruna -
2022 Poster: DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning »
Quan Vuong · Aviral Kumar · Sergey Levine · Yevgen Chebotar -
2022 Poster: Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Abhishek Gupta · Dibya Ghosh · Sergey Levine · Pulkit Agrawal -
2022 Poster: Data-Driven Offline Decision-Making via Invariant Representation Learning »
Han Qi · Yi Su · Aviral Kumar · Sergey Levine -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 Workshop: Offline Reinforcement Learning »
Rishabh Agarwal · Aviral Kumar · George Tucker · Justin Fu · Nan Jiang · Doina Precup · Aviral Kumar -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2021 : Data-Driven Offline Optimization for Architecting Hardware Accelerators »
Aviral Kumar · Amir Yazdanbakhsh · Milad Hashemi · Kevin Swersky · Sergey Levine -
2021 Poster: Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs »
harsh satija · Philip Thomas · Joelle Pineau · Romain Laroche -
2021 Poster: Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates »
Romain Laroche · Remi Tachet des Combes -
2021 : ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation »
Chuang Gan · Jeremy Schwartz · Seth Alter · Damian Mrowca · Martin Schrimpf · James Traer · Julian De Freitas · Jonas Kubilius · Abhishek Bhandwaldar · Nick Haber · Megumi Sano · Kuno Kim · Elias Wang · Michael Lingelbach · Aidan Curtis · Kevin Feigelis · Daniel Bear · Dan Gutfreund · David Cox · Antonio Torralba · James J DiCarlo · Josh Tenenbaum · Josh McDermott · Dan Yamins -
2020 Workshop: Offline Reinforcement Learning »
Aviral Kumar · Rishabh Agarwal · George Tucker · Lihong Li · Doina Precup · Aviral Kumar -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing -
2020 Poster: Model Inversion Networks for Model-Based Optimization »
Aviral Kumar · Sergey Levine -
2020 Poster: Conservative Q-Learning for Offline Reinforcement Learning »
Aviral Kumar · Aurick Zhou · George Tucker · Sergey Levine -
2020 Tutorial: (Track3) Offline Reinforcement Learning: From Algorithm Design to Practical Applications Q&A »
Sergey Levine · Aviral Kumar -
2020 Session: Orals & Spotlights Track 09: Reinforcement Learning »
Pulkit Agrawal · Mohammad Ghavamzadeh -
2020 Poster: Learning Dynamic Belief Graphs to Generalize on Text-Based Games »
Ashutosh Adhikari · Xingdi Yuan · Marc-Alexandre Côté · Mikuláš Zelinka · Marc-Antoine Rondeau · Romain Laroche · Pascal Poupart · Jian Tang · Adam Trischler · Will Hamilton -
2020 Poster: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar · Abhishek Gupta · Sergey Levine -
2020 Spotlight: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar · Abhishek Gupta · Sergey Levine -
2019 Poster: Superposition of many models into one »
Brian Cheung · Alexander Terekhov · Yubei Chen · Pulkit Agrawal · Bruno Olshausen -
2019 Poster: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction »
Aviral Kumar · Justin Fu · George Tucker · Sergey Levine -
2017 Poster: Hybrid Reward Architecture for Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Romain Laroche · Joshua Romoff · Tavian Barnes · Jeffrey Tsang -
2016 : What makes ImageNet good for Transfer Learning? »
Jacob MY Huh · Pulkit Agrawal · Alexei Efros -
2016 : Jitendra Malik and Pulkit Agrawal »
Jitendra Malik · Pulkit Agrawal -
2016 Poster: Learning to Poke by Poking: Experiential Learning of Intuitive Physics »
Pulkit Agrawal · Ashvin Nair · Pieter Abbeel · Jitendra Malik · Sergey Levine -
2016 Oral: Learning to Poke by Poking: Experiential Learning of Intuitive Physics »
Pulkit Agrawal · Ashvin Nair · Pieter Abbeel · Jitendra Malik · Sergey Levine