Timezone: »
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task” that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
Author Information
David Abel (DeepMind)
Will Dabney (DeepMind)
Anna Harutyunyan (DeepMind)
Mark Ho (Princeton University)
Michael Littman (Brown University)
Doina Precup (DeepMind)
Satinder Singh (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: On the Expressivity of Markov Reward »
Fri. Dec 10th 04:30 -- 06:00 PM Room
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Spotlight: Flexible Option Learning »
Martin Klissarov · Doina Precup -
2021 Spotlight: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 : Understanding and Preventing Capacity Loss in Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney -
2021 : Bayesian Exploration for Lifelong Reinforcement Learning »
Haotian Fu · Shangqun Yu · Michael Littman · George Konidaris -
2021 : GrASP: Gradient-Based Affordance Selection for Planning »
Vivek Veeriah · Zeyu Zheng · Richard L Lewis · Satinder Singh -
2021 : Policy Gradients Incorporating the Future »
David Venuto · Elaine Lau · Doina Precup · Ofir Nachum -
2021 : A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2022 : In-Context Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2022 : Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2023 Poster: Deep Reinforcement Learning with Plasticity Injection »
Evgenii Nikishin · Junhyuk Oh · Georg Ostrovski · Clare Lyle · Razvan Pascanu · Will Dabney · Andre Barreto -
2023 Poster: Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2023 Poster: A Definition of Continual Reinforcement Learning »
David Abel · Andre Barreto · Benjamin Van Roy · Doina Precup · Hado van Hasselt · Satinder Singh -
2023 Poster: Large Language Models can Implement Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2023 Poster: Discovering Representations for Transfer with Successor Features and the Deep Option Keyboard »
Wilka Carvalho Carvalho · Andre Saraiva · Angelos Filos · Andrew Lampinen · Loic Matthey · Richard L Lewis · Honglak Lee · Satinder Singh · Danilo Jimenez Rezende · Daniel Zoran -
2023 Poster: Structured State Space Models for In-Context Reinforcement Learning »
Chris Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Spotlight: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning »
Yunhao Tang · Remi Munos · Mark Rowland · Bernardo Avila Pires · Will Dabney · Marc Bellemare -
2022 Spotlight: Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex »
Charles Lovering · Jessica Forde · George Konidaris · Ellie Pavlick · Michael Littman -
2022 Poster: Palm up: Playing in the Latent Manifold for Unsupervised Pretraining »
Hao Liu · Tom Zahavy · Volodymyr Mnih · Satinder Singh -
2022 Poster: Approximate Value Equivalence »
Christopher Grimm · Andre Barreto · Satinder Singh -
2022 Poster: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning »
Yunhao Tang · Remi Munos · Mark Rowland · Bernardo Avila Pires · Will Dabney · Marc Bellemare -
2022 Poster: Faster Deep Reinforcement Learning with Slower Online Network »
Kavosh Asadi · Rasool Fakoor · Omer Gottesman · Taesup Kim · Michael Littman · Alexander Smola -
2022 Poster: Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex »
Charles Lovering · Jessica Forde · George Konidaris · Ellie Pavlick · Michael Littman -
2022 Poster: Model-based Lifelong Reinforcement Learning with Bayesian Exploration »
Haotian Fu · Shangqun Yu · Michael Littman · George Konidaris -
2022 Poster: Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 : Reducing the Information Horizon of Bayes-Adaptive Markov Decision Processes via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2021 : Bootstrapped Meta-Learning »
Sebastian Flennerhag · Yannick Schroecker · Tom Zahavy · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Flexible Option Learning »
Martin Klissarov · Doina Precup -
2021 Poster: A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio -
2021 Poster: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation »
Emmanuel Bengio · Moksh Jain · Maksym Korablyov · Doina Precup · Yoshua Bengio -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2021 Poster: Temporally Abstract Partial Models »
Khimya Khetarpal · Zafarali Ahmed · Gheorghe Comanici · Doina Precup -
2021 Poster: The Difficulty of Passive Learning in Deep Reinforcement Learning »
Georg Ostrovski · Pablo Samuel Castro · Will Dabney -
2020 : Mini-panel discussion 3 - Prioritizing Real World RL Challenges »
Chelsea Finn · Thomas Dietterich · Angela Schoellig · Anca Dragan · Anusha Nagabandi · Doina Precup -
2020 Workshop: The Challenges of Real World Reinforcement Learning »
Daniel Mankowitz · Gabriel Dulac-Arnold · Shie Mannor · Omer Gottesman · Anusha Nagabandi · Doina Precup · Timothy A Mann · Gabriel Dulac-Arnold -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Value-driven Hindsight Modelling »
Arthur Guez · Fabio Viola · Theophane Weber · Lars Buesing · Steven Kapturowski · Doina Precup · David Silver · Nicolas Heess -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Spotlight: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2020 Spotlight: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2019 : Abstraction & Meta-Reinforcement Learning »
David Abel -
2019 Demonstration: The Option Keyboard: Combining Skills in Reinforcement Learning »
Daniel Toyama · Shaobo Hou · Gheorghe Comanici · Andre Barreto · Doina Precup · Shibl Mourad · Eser Aygün · Philippe Hamel -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: The Option Keyboard: Combining Skills in Reinforcement Learning »
Andre Barreto · Diana Borsa · Shaobo Hou · Gheorghe Comanici · Eser Aygün · Philippe Hamel · Daniel Toyama · jonathan j hunt · Shibl Mourad · David Silver · Doina Precup -
2019 Poster: On the Utility of Learning about Humans for Human-AI Coordination »
Micah Carroll · Rohin Shah · Mark Ho · Tom Griffiths · Sanjit Seshia · Pieter Abbeel · Anca Dragan -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2017 : Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 : Spotlights & Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 : Best Paper Award and Talk — Learning with options that terminate off-policy (Anna Harutyunyan) »
Anna Harutyunyan -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson @korymath · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2016 Oral: Showing versus doing: Teaching by demonstration »
Mark Ho · Michael Littman · James MacGlashan · Fiery Cushman · Joseph L Austerweil -
2016 Poster: Showing versus doing: Teaching by demonstration »
Mark Ho · Michael Littman · James MacGlashan · Fiery Cushman · Joe Austerweil · Joseph L Austerweil -
2016 Poster: Safe and Efficient Off-Policy Reinforcement Learning »
Remi Munos · Tom Stepleton · Anna Harutyunyan · Marc Bellemare