Timezone: »
We consider how to transfer knowledge from previous tasks to a current task in long-lived and bounded agents that must solve a sequence of MDPs over a finite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent's behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Specifically, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.
Author Information
Xiaoxiao Guo (University of Michigan, Ann Arbor)
Satinder Singh (University of Michigan)
Richard L Lewis (University of Michigan)
More from the Same Authors
-
2021 : GrASP: Gradient-Based Affordance Selection for Planning »
Vivek Veeriah · Zeyu Zheng · Richard L Lewis · Satinder Singh -
2022 : In-Context Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2019 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Joshua Achiam · Carlos Florensa · Christopher Grimm · Haoran Tang · Vivek Veeriah -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: No-Press Diplomacy: Modeling Multi-Agent Gameplay »
Philip Paquette · Yuchen Lu · SETON STEVEN BOCCO · Max Smith · Satya O.-G. · Jonathan K. Kummerfeld · Joelle Pineau · Satinder Singh · Aaron Courville -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: On Learning Intrinsic Rewards for Policy Gradient Methods »
Zeyu Zheng · Junhyuk Oh · Satinder Singh -
2018 Poster: Completing State Representations using Spectral Learning »
Nan Jiang · Alex Kulesza · Satinder Singh -
2017 : Afternoon Panel discussion »
Brian Skyrms · Satinder Singh · Jacob Andreas -
2017 : "Language Emergence as Boundedly Optimal Control" »
Satinder Singh -
2017 : Minimax-Regret Querying on Side Effects in Factored Markov Decision Processes »
Satinder Singh -
2017 : Invited Talk - Satindar Singh »
Satinder Singh -
2017 Symposium: Deep Reinforcement Learning »
Pieter Abbeel · Yan Duan · David Silver · Satinder Singh · Junhyuk Oh · Rein Houthooft -
2017 Poster: Repeated Inverse Reinforcement Learning »
Kareem Amin · Nan Jiang · Satinder Singh -
2017 Spotlight: Repeated Inverse Reinforcement Learning »
Kareem Amin · Nan Jiang · Satinder Singh -
2017 Poster: Value Prediction Network »
Junhyuk Oh · Satinder Singh · Honglak Lee -
2016 Workshop: Deep Reinforcement Learning »
David Silver · Satinder Singh · Pieter Abbeel · Peter Chen -
2015 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · John Schulman · Satinder Singh · David Silver -
2015 Poster: Action-Conditional Video Prediction using Deep Networks in Atari Games »
Junhyuk Oh · Xiaoxiao Guo · Honglak Lee · Richard L Lewis · Satinder Singh -
2015 Spotlight: Action-Conditional Video Prediction using Deep Networks in Atari Games »
Junhyuk Oh · Xiaoxiao Guo · Honglak Lee · Richard L Lewis · Satinder Singh -
2014 Poster: Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning »
Xiaoxiao Guo · Satinder Singh · Honglak Lee · Richard L Lewis · Xiaoshi Wang -
2013 Session: Oral Session 9 »
Satinder Singh -
2010 Poster: Reward Design via Online Gradient Ascent »
Jonathan D Sorg · Satinder Singh · Richard L Lewis -
2008 Poster: Simple Local Models for Complex Dynamical Systems »
Erik Talvitie · Satinder Singh -
2008 Oral: Simple Local Models for Complex Dynamical Systems »
Erik Talvitie · Satinder Singh -
2007 Oral: Exponential Family Predictive Representations of State »
David Wingate · Satinder Singh -
2007 Poster: Exponential Family Predictive Representations of State »
David Wingate · Satinder Singh