Timezone: »
Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision making tasks with a non-trivial element of exploration requires either specifying carefully designed reward functions or relying on indiscriminate, novelty seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we propose a technique - Human Guided Exploration (HUGE), that is able to leverage low-quality feedback from non-expert users, which is infrequent, asynchronous and noisy, to guide exploration for reinforcement learning, without requiring careful reward specification. The key idea is to separate the challenges of directed exploration and policy learning - human feedback is used to direct exploration, while self-supervised policy learning is used to independently learn unbiased behaviors from the collected data. We show that this procedure can leverage noisy, asynchronous human feedback to learn tasks with no hand-crafted reward design or exploration bonuses. We show that HUGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots.
Author Information
Marcel Torne Villasevil (Massachusetts Institute of Technology)
Max Balsells I Pamies (Department of Computer Science)
Zihan Wang (University of Washington)
Samedh Desai
Tao Chen (Massachusetts Institute of Technology)
Pulkit Agrawal (MIT)
Abhishek Gupta (University of Washington)
More from the Same Authors
-
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2022 : Is Conditional Generative Modeling all you need for Decision-Making? »
Anurag Ajay · Yilun Du · Abhi Gupta · Josh Tenenbaum · Tommi Jaakkola · Pulkit Agrawal -
2022 : Learning to Extrapolate: A Transductive Approach »
Aviv Netanyahu · Abhishek Gupta · Max Simchowitz · Kaiqing Zhang · Pulkit Agrawal -
2022 : Fast Adaptation via Human Diagnosis of Task Distribution Shift »
Andi Peng · Mark Ho · Aviv Netanyahu · Julie A Shah · Pulkit Agrawal -
2022 : Aligning Robot Representations with Humans »
Andreea Bobu · Andi Peng · Pulkit Agrawal · Julie A Shah · Anca Dragan -
2023 : Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing »
Xinyu Hu · Pengfei Tang · Simiao Zuo · Zihan Wang · Bowen Song · Qiang Lou · Jian Jiao · Denis Charles -
2023 : Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity »
Jaedong Hwang · Zhang-Wei Hong · Eric Chen · Akhilan Boopathy · Pulkit Agrawal · Ila Fiete -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing »
Xinyu Hu · Pengfei Tang · Simiao Zuo · Zihan Wang · Bowen Song · Qiang Lou · Jian Jiao · Denis Charles -
2023 : Modeling Boundedly Rational Agents with Latent Inference Budgets »
Athul Jacob · Abhishek Gupta · Jacob Andreas -
2023 : Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning »
Zhaoyi Zhou · Chuning Zhu · Runlong Zhou · Qiwen Cui · Abhishek Gupta · Simon Du -
2023 : Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning »
Zhaoyi Zhou · Chuning Zhu · Runlong Zhou · Qiwen Cui · Abhishek Gupta · Simon Du -
2023 : Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 : Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 : Semantically-Driven Object Search Using Partially Observed 3D Scene Graphs »
Isaac Remy · Abhishek Gupta · Karen Leung -
2023 : Semantically-Driven Object Search Using Partially Observed 3D Scene Graphs »
Isaac Remy · Abhishek Gupta · Karen Leung -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 : Universal Visual Decomposer: Long-Horizon Manipulation Made Easy »
Zichen "Charles" Zhang · Yunshuang Li · Osbert Bastani · Abhishek Gupta · Dinesh Jayaraman · Jason Ma · Luca Weihs -
2023 Poster: Self-Supervised Reinforcement Learning that Transfers using Random Features »
Boyuan Chen · Chuning Zhu · Pulkit Agrawal · Kaiqing Zhang · Abhishek Gupta -
2023 Poster: Human-Guided Complexity-Controlled Abstractions »
Andi Peng · Mycal Tucker · Eoin Kenny · Noga Zaslavsky · Pulkit Agrawal · Julie A Shah -
2023 Poster: RoboHive: A Unified Framework for Robot Learning »
Vikash Kumar · Rutav Shah · Gaoyue Zhou · Vincent Moens · Vittorio Caggiano · Abhishek Gupta · Aravind Rajeswaran -
2023 Poster: Compositional Foundation Models for Hierarchical Planning »
Anurag Ajay · Seungwook Han · Yilun Du · Shuang Li · Abhi Gupta · Tommi Jaakkola · Josh Tenenbaum · Leslie Kaelbling · Akash Srivastava · Pulkit Agrawal -
2023 Poster: Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets »
Zhang-Wei Hong · Aviral Kumar · Sathwik Karnik · Abhishek Bhandwaldar · Akash Srivastava · Joni Pajarinen · Romain Laroche · Abhishek Gupta · Pulkit Agrawal -
2023 Poster: RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability »
Chuning Zhu · Max Simchowitz · Siri Gadipudi · Abhishek Gupta -
2022 : Visual Pre-training for Navigation: What Can We Learn from Noise? »
Felix Yanwei Wang · Ching-Yun Ko · Pulkit Agrawal -
2022 Poster: Redeeming intrinsic rewards via constrained optimization »
Eric Chen · Zhang-Wei Hong · Joni Pajarinen · Pulkit Agrawal -
2022 Poster: Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Abhishek Gupta · Dibya Ghosh · Sergey Levine · Pulkit Agrawal -
2022 Poster: Pre-Trained Language Models for Interactive Decision-Making »
Shuang Li · Xavier Puig · Chris Paxton · Yilun Du · Clinton Wang · Linxi Fan · Tao Chen · De-An Huang · Ekin Akyürek · Anima Anandkumar · Jacob Andreas · Igor Mordatch · Antonio Torralba · Yuke Zhu -
2021 : 3D Neural Scene Representations for Visuomotor Control »
Yunzhu Li · Shuang Li · Vincent Sitzmann · Pulkit Agrawal · Antonio Torralba -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing -
2020 Session: Orals & Spotlights Track 09: Reinforcement Learning »
Pulkit Agrawal · Mohammad Ghavamzadeh -
2019 Poster: Superposition of many models into one »
Brian Cheung · Alexander Terekhov · Yubei Chen · Pulkit Agrawal · Bruno Olshausen -
2016 : What makes ImageNet good for Transfer Learning? »
Jacob MY Huh · Pulkit Agrawal · Alexei Efros -
2016 : Jitendra Malik and Pulkit Agrawal »
Jitendra Malik · Pulkit Agrawal -
2016 Poster: Learning to Poke by Poking: Experiential Learning of Intuitive Physics »
Pulkit Agrawal · Ashvin Nair · Pieter Abbeel · Jitendra Malik · Sergey Levine -
2016 Oral: Learning to Poke by Poking: Experiential Learning of Intuitive Physics »
Pulkit Agrawal · Ashvin Nair · Pieter Abbeel · Jitendra Malik · Sergey Levine