Timezone: »
Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks. Specifically, this paper focuses on goal-conditioned reinforcement learning where the idealized (unachievable) target distribution has full measure at the goal. This paper introduces a quasimetric specific to Markov Decision Processes (MDPs) and uses this quasimetric to estimate the above Wasserstein-1 distance. It further shows that the policy that minimizes this Wasserstein-1 distance is the policy that reaches the goal in as few steps as possible. Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function. Our experiments show that this reward function changes smoothly with respect to transitions in the MDP and directs the agent's exploration to find the goal efficiently. Additionally, we combine AIM with Hindsight Experience Replay (HER) and show that the resulting algorithm accelerates learning significantly on several simulated robotics tasks when compared to other rewards that encourage exploration or accelerate learning.
Author Information
Ishan Durugkar (University of Texas at Austin)
Mauricio Tec (University of Texas at Austin)
Scott Niekum (UT Austin)
Peter Stone (The University of Texas at Austin, Sony AI)
More from the Same Authors
-
2020 : Paper 19: Multiagent Driving Policy for Congestion Reduction in a Large Scale Scenario »
Jiaxun Cui · Peter Stone -
2021 : Task-Independent Causal State Abstraction »
Zizhao Wang · Xuesu Xiao · Yuke Zhu · Peter Stone -
2021 : Leveraging Information about Background Music in Human-Robot Interaction »
Elad Liebman · Peter Stone -
2021 : Wasserstein Distance Maximizing Intrinsic Control »
Ishan Durugkar · Steven Hansen · Stephen Spencer · Volodymyr Mnih · Ishan Durugkar -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2022 : BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Mao Ye · Bo Liu · Stephen Wright · Peter Stone · Qiang Liu -
2022 : Language-guided Task Adaptation for Imitation Learning »
Prasoon Goyal · Raymond Mooney · Scott Niekum -
2022 : ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning »
Eddy Hudson · Ishan Durugkar · Garrett Warnell · Peter Stone -
2022 : ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning »
Eddy Hudson · Ishan Durugkar · Garrett Warnell · Peter Stone -
2022 : A Ranking Game for Imitation Learning »
Harshit Sushil Sikchi · Akanksha Saran · Wonjoon Goo · Scott Niekum -
2023 Poster: FAMO: Fast Adaptive Multitask Optimization »
Bo Liu · Yihao Feng · Peter Stone · Qiang Liu -
2023 Poster: ELDEN: Exploration via Local Dependencies »
Zizhao Wang · Jiaheng Hu · Roberto Martín-Martín · Peter Stone -
2023 Poster: f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences »
Siddhant Agarwal · Ishan Durugkar · Peter Stone · Amy Zhang -
2023 Poster: LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning »
Bo Liu · Yifeng Zhu · Chongkai Gao · Yihao Feng · Qiang Liu · Yuke Zhu · Peter Stone -
2022 : Panel RL Theory-Practice Gap »
Peter Stone · Matej Balog · Jonas Buchli · Jason Gauci · Dhruv Madeka -
2022 : Panel RL Benchmarks »
Minmin Chen · Pablo Samuel Castro · Caglar Gulcehre · Tony Jebara · Peter Stone -
2022 : Invited talk: Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning »
Peter Stone -
2022 : Human in the Loop Learning for Robot Navigation and Task Learning from Implicit Human Feedback »
Peter Stone -
2022 Workshop: All Things Attention: Bridging Different Perspectives on Attention »
Abhijat Biswas · Akanksha Saran · Khimya Khetarpal · Reuben Aronson · Ruohan Zhang · Grace Lindsay · Scott Niekum -
2022 Poster: BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Bo Liu · Mao Ye · Stephen Wright · Peter Stone · Qiang Liu -
2022 Poster: Value Function Decomposition for Iterative Design of Reinforcement Learning Agents »
James MacGlashan · Evan Archer · Alisa Devlic · Takuma Seno · Craig Sherstan · Peter Wurman · Peter Stone -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Conflict-Averse Gradient Descent for Multi-task learning »
Bo Liu · Xingchao Liu · Xiaojie Jin · Peter Stone · Qiang Liu -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Machine versus Human Attention in Deep Reinforcement Learning Tasks »
Sihang Guo · Ruohan Zhang · Bo Liu · Yifeng Zhu · Dana Ballard · Mary Hayhoe · Peter Stone -
2020 : Q&A: Peter Stone (The University of Texas at Austin): Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination, with Natasha Jaques (Google) [moderator] »
Peter Stone · Natasha Jaques -
2020 : Invited Speaker: Peter Stone (The University of Texas at Austin) on Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination »
Peter Stone -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Discussion Panel »
Pete Florence · Dorsa Sadigh · Carolina Parada · Jeannette Bohg · Roberto Calandra · Peter Stone · Fabio Ramos -
2020 : Invited talk: Peter Stone "Grounded Simulation Learning for Sim2Real with Connections to Off-Policy Reinforcement Learning" »
Peter Stone -
2020 Poster: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks »
Lemeng Wu · Bo Liu · Peter Stone · Qiang Liu -
2020 Poster: Bayesian Robust Optimization for Imitation Learning »
Daniel S. Brown · Scott Niekum · Marek Petrik -
2020 Poster: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch »
Siddharth Desai · Ishan Durugkar · Haresh Karnan · Garrett Warnell · Josiah Hanna · Peter Stone -
2019 : Scott Niekum: Scaling Probabilistically Safe Learning to Robotics »
Scott Niekum -
2018 : Peter Stone »
Peter Stone -
2018 : Control Algorithms for Imitation Learning from Observation »
Peter Stone -
2018 : Peter Stone »
Peter Stone -
2016 : Peter Stone (University of Texas at Austin) »
Peter Stone -
2015 Workshop: Learning, Inference and Control of Multi-Agent Systems »
Vicenç Gómez · Gerhard Neumann · Jonathan S Yedidia · Peter Stone -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris