Timezone: »
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analyses, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.
Author Information
Pablo Samuel Castro (Google)
Tyler Kastner (McGill University)
Prakash Panangaden (McGill University, Montreal)
Mark Rowland (DeepMind)
More from the Same Authors
-
2020 : GANterpretations »
Pablo Samuel Castro -
2021 : Lifting the veil on hyper-parameters for value-baseddeep reinforcement learning »
João Madeira Araújo · Johan Obando Ceron · Pablo Samuel Castro -
2021 : Understanding and Preventing Capacity Loss in Reinforcement Learning »
Clare Lyle · Mark Rowland · Will Dabney -
2021 : Lifting the veil on hyper-parameters for value-baseddeep reinforcement learning »
João Madeira Araújo · Johan Obando Ceron · Pablo Samuel Castro -
2021 : Invited Talk: Pablo Castro (Google Brain) on Estimating Policy Functions in Payment Systems using Reinforcement Learning »
Pablo Samuel Castro -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2021 Workshop: Advances in Programming Languages and Neurosymbolic Systems (AIPLANS) »
Breandan Considine · Disha Shrivastava · David Yu-Tung Hui · Chin-Wei Huang · Shawn Tan · Xujie Si · Prakash Panangaden · Guy Van den Broeck · Daniel Tarlow -
2021 Oral: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2021 : Lifting the veil on hyper-parameters for value-baseddeep reinforcement learning »
João Madeira Araújo · Johan Obando Ceron · Pablo Samuel Castro -
2021 Poster: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2021 Poster: Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Remi Munos · Michal Valko -
2021 Poster: The Difficulty of Passive Learning in Deep Reinforcement Learning »
Georg Ostrovski · Pablo Samuel Castro · Will Dabney -
2020 : Contributed Talk #3: Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning »
Rishabh Agarwal · Marlos C. Machado · Pablo Samuel Castro · Marc Bellemare -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2019 Spotlight: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2015 Poster: Basis refinement strategies for linear value function approximation in MDPs »
Gheorghe Comanici · Doina Precup · Prakash Panangaden -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden -
2008 Poster: Bounding Performance Loss in Approximate MDP Homomorphisms »
Doina Precup · Jonathan Taylor Taylor · Prakash Panangaden