Timezone: »
Recent work has demonstrated that when artificial agents are limited in their ability to achieve their goals, the agent designer can benefit by making the agent's goals different from the designer's. This gives rise to the optimization problem of designing the artificial agent's goals---in the RL framework, designing the agent's reward function. Existing attempts at solving this optimal reward problem do not leverage experience gained online during the agent's lifetime nor do they take advantage of knowledge about the agent's structure. In this work, we develop a gradient ascent approach with formal convergence guarantees for approximately solving the optimal reward problem online during an agent's lifetime. We show that our method generalizes a standard policy gradient approach, and we demonstrate its ability to improve reward functions in agents with various forms of limitations.
Author Information
Jonathan D Sorg (University of Michigan)
Satinder Singh (University of Michigan)
Richard L Lewis (University of Michigan)
More from the Same Authors
-
2021 : GrASP: Gradient-Based Affordance Selection for Planning »
Vivek Veeriah · Zeyu Zheng · Richard L Lewis · Satinder Singh -
2022 : In-Context Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2019 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Joshua Achiam · Carlos Florensa · Christopher Grimm · Haoran Tang · Vivek Veeriah -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: No-Press Diplomacy: Modeling Multi-Agent Gameplay »
Philip Paquette · Yuchen Lu · SETON STEVEN BOCCO · Max Smith · Satya O.-G. · Jonathan K. Kummerfeld · Joelle Pineau · Satinder Singh · Aaron Courville -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: On Learning Intrinsic Rewards for Policy Gradient Methods »
Zeyu Zheng · Junhyuk Oh · Satinder Singh -
2018 Poster: Completing State Representations using Spectral Learning »
Nan Jiang · Alex Kulesza · Satinder Singh -
2017 : Afternoon Panel discussion »
Brian Skyrms · Satinder Singh · Jacob Andreas -
2017 : "Language Emergence as Boundedly Optimal Control" »
Satinder Singh -
2017 : Minimax-Regret Querying on Side Effects in Factored Markov Decision Processes »
Satinder Singh -
2017 : Invited Talk - Satindar Singh »
Satinder Singh -
2017 Symposium: Deep Reinforcement Learning »
Pieter Abbeel · Yan Duan · David Silver · Satinder Singh · Junhyuk Oh · Rein Houthooft -
2017 Poster: Repeated Inverse Reinforcement Learning »
Kareem Amin · Nan Jiang · Satinder Singh -
2017 Spotlight: Repeated Inverse Reinforcement Learning »
Kareem Amin · Nan Jiang · Satinder Singh -
2017 Poster: Value Prediction Network »
Junhyuk Oh · Satinder Singh · Honglak Lee -
2016 Workshop: Deep Reinforcement Learning »
David Silver · Satinder Singh · Pieter Abbeel · Peter Chen -
2015 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · John Schulman · Satinder Singh · David Silver -
2015 Poster: Action-Conditional Video Prediction using Deep Networks in Atari Games »
Junhyuk Oh · Xiaoxiao Guo · Honglak Lee · Richard L Lewis · Satinder Singh -
2015 Spotlight: Action-Conditional Video Prediction using Deep Networks in Atari Games »
Junhyuk Oh · Xiaoxiao Guo · Honglak Lee · Richard L Lewis · Satinder Singh -
2014 Poster: Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning »
Xiaoxiao Guo · Satinder Singh · Honglak Lee · Richard L Lewis · Xiaoshi Wang -
2013 Poster: Reward Mapping for Transfer in Long-Lived Agents »
Xiaoxiao Guo · Satinder Singh · Richard L Lewis -
2013 Session: Oral Session 9 »
Satinder Singh -
2008 Poster: Simple Local Models for Complex Dynamical Systems »
Erik Talvitie · Satinder Singh -
2008 Oral: Simple Local Models for Complex Dynamical Systems »
Erik Talvitie · Satinder Singh -
2007 Oral: Exponential Family Predictive Representations of State »
David Wingate · Satinder Singh -
2007 Poster: Exponential Family Predictive Representations of State »
David Wingate · Satinder Singh