Timezone: »
Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are known to be hard to generalize across tasks. Model-based RL, on the other hand, learns task-agnostic models of the world that naturally enables transfer across different reward functions, but struggles to scale to complex environments due to the compounding error. To get the best of both worlds, we propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards, while circumventing the challenges of model-based RL. In particular, we show self-supervised pre-training of model-free reinforcement learning with a number of random features as rewards allows implicit modeling of long-horizon environment dynamics. Then, planning techniques like model-predictive control using these implicit models enable fast adaptation to problems with new reward functions. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks. We validate that our proposed method enables transfer across tasks on a variety of manipulation and locomotion domains in simulation, opening the door to generalist decision-making agents.
Author Information
Boyuan Chen (MIT)
Chuning Zhu (University of Washington)
Pulkit Agrawal (MIT)
Kaiqing Zhang (University of Maryland, College Park)
Abhishek Gupta (University of Washington)
More from the Same Authors
-
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2022 : Is Conditional Generative Modeling all you need for Decision-Making? »
Anurag Ajay · Yilun Du · Abhi Gupta · Josh Tenenbaum · Tommi Jaakkola · Pulkit Agrawal -
2022 : Learning to Extrapolate: A Transductive Approach »
Aviv Netanyahu · Abhishek Gupta · Max Simchowitz · Kaiqing Zhang · Pulkit Agrawal -
2022 : Fast Adaptation via Human Diagnosis of Task Distribution Shift »
Andi Peng · Mark Ho · Aviv Netanyahu · Julie A Shah · Pulkit Agrawal -
2022 : Aligning Robot Representations with Humans »
Andreea Bobu · Andi Peng · Pulkit Agrawal · Julie A Shah · Anca Dragan -
2023 : Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity »
Jaedong Hwang · Zhang-Wei Hong · Eric Chen · Akhilan Boopathy · Pulkit Agrawal · Ila Fiete -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Modeling Boundedly Rational Agents with Latent Inference Budgets »
Athul Jacob · Abhishek Gupta · Jacob Andreas -
2023 : Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning »
Zhaoyi Zhou · Chuning Zhu · Runlong Zhou · Qiwen Cui · Abhishek Gupta · Simon Du -
2023 : Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning »
Zhaoyi Zhou · Chuning Zhu · Runlong Zhou · Qiwen Cui · Abhishek Gupta · Simon Du -
2023 : Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 : Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 : Semantically-Driven Object Search Using Partially Observed 3D Scene Graphs »
Isaac Remy · Abhishek Gupta · Karen Leung -
2023 : Semantically-Driven Object Search Using Partially Observed 3D Scene Graphs »
Isaac Remy · Abhishek Gupta · Karen Leung -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 Poster: Breadcrumbs to the Goal: Supervised Goal Selection from Human-in-the-Loop Feedback »
Marcel Torne Villasevil · Max Balsells I Pamies · Zihan Wang · Samedh Desai · Tao Chen · Pulkit Agrawal · Abhishek Gupta -
2023 Poster: Human-Guided Complexity-Controlled Abstractions »
Andi Peng · Mycal Tucker · Eoin Kenny · Noga Zaslavsky · Pulkit Agrawal · Julie A Shah -
2023 Poster: RoboHive: A Unified Framework for Robot Learning »
Vikash Kumar · Rutav Shah · Gaoyue Zhou · Vincent Moens · Vittorio Caggiano · Abhishek Gupta · Aravind Rajeswaran -
2023 Poster: A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games »
Zaiwei Chen · Kaiqing Zhang · Eric Mazumdar · Asuman Ozdaglar · Adam Wierman -
2023 Poster: Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 Poster: Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs »
Dongsheng Ding · Chen-Yu Wei · Kaiqing Zhang · Alejandro Ribeiro -
2023 Poster: Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets »
Zhang-Wei Hong · Aviral Kumar · Sathwik Karnik · Abhishek Bhandwaldar · Akash Srivastava · Joni Pajarinen · Romain Laroche · Abhishek Gupta · Pulkit Agrawal -
2023 Poster: RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability »
Chuning Zhu · Max Simchowitz · Siri Gadipudi · Abhishek Gupta -
2023 Poster: Multi-Player Zero-Sum Markov Games with Networked Separable Interactions »
Chanwoo Park · Kaiqing Zhang · Asuman Ozdaglar -
2022 : Visual Pre-training for Navigation: What Can We Learn from Noise? »
Felix Yanwei Wang · Ching-Yun Ko · Pulkit Agrawal -
2022 Poster: Redeeming intrinsic rewards via constrained optimization »
Eric Chen · Zhang-Wei Hong · Joni Pajarinen · Pulkit Agrawal -
2022 Poster: Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Abhishek Gupta · Dibya Ghosh · Sergey Levine · Pulkit Agrawal -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing -
2020 Session: Orals & Spotlights Track 09: Reinforcement Learning »
Pulkit Agrawal · Mohammad Ghavamzadeh -
2019 Poster: Superposition of many models into one »
Brian Cheung · Alexander Terekhov · Yubei Chen · Pulkit Agrawal · Bruno Olshausen -
2016 : What makes ImageNet good for Transfer Learning? »
Jacob MY Huh · Pulkit Agrawal · Alexei Efros -
2016 : Jitendra Malik and Pulkit Agrawal »
Jitendra Malik · Pulkit Agrawal -
2016 Poster: Learning to Poke by Poking: Experiential Learning of Intuitive Physics »
Pulkit Agrawal · Ashvin Nair · Pieter Abbeel · Jitendra Malik · Sergey Levine -
2016 Oral: Learning to Poke by Poking: Experiential Learning of Intuitive Physics »
Pulkit Agrawal · Ashvin Nair · Pieter Abbeel · Jitendra Malik · Sergey Levine