Timezone: »
State-of-the-art world models such as DreamerV2 have significantly improved the capabilities of model-based reinforcement learning. However, these approaches typically rely on reconstruction losses to shape their latent representations of the environment, which are known to fail in environments with high fidelity visual observations. When learning latent dynamics models without reconstruction loss using only the signal present in the reward signal, the performance of these methods also drops dramatically. We present a simple modification to DreamerV2 without reconstruction loss inspired by the recent self-supervised learning method Bootstrap Your Own Latent. The combination of adding a stop-gradient to the posterior, using a powerful auto-regressive model for the prior, and using a slowly updating target encoder, which we call BLAST, allows the world model to learn from signals present in both the reward and observations, improving efficiency on our tested environment as well as being significantly more robust to visual distractors.
Author Information
Keiran Paster (University of Toronto)
Lev McKinney (University of Toronto)
Sheila McIlraith (University of Toronto and Vector Institute)
Jimmy Ba (University of Toronto / Vector Institute)
More from the Same Authors
-
2020 : Poster #6 »
Sheila McIlraith -
2021 : Avoiding Negative Side Effects by Considering Others »
Parand Alizadeh Alamdari · Toryn Klassen · Rodrigo Toro Icarte · Sheila McIlraith -
2022 : Large Language Models Are Human-Level Prompt Engineers »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines »
Andrew Li · Zizhao Chen · Pashootan Vaezipoor · Toryn Klassen · Rodrigo Toro Icarte · Sheila McIlraith -
2022 : On The Fragility of Learned Reward Functions »
Lev McKinney · Yawen Duan · Adam Gleave · David Krueger -
2022 : Temporary Goals for Exploration »
Haoyang Xu · Jimmy Ba · Silviu Pitis · Harris Chan -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Guiding Exploration Towards Impactful Actions »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2022 : Steering Large Language Models using APE »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Keiran Paster · Silviu Pitis · Harris Chan · Jimmy Ba -
2022 : Epistemic Side Effects & Avoiding Them (Sometimes) »
Toryn Klassen · Parand Alizadeh Alamdari · Sheila McIlraith -
2022 : Rational Multi-Objective Agents Must Admit Non-Markov Reward Representations »
Silviu Pitis · Duncan Bailey · Jimmy Ba -
2022 : Invited Talk by Jimmy Ba »
Jimmy Ba -
2022 Poster: Learning to Follow Instructions in Text-Based Games »
Mathieu Tuli · Andrew Li · Pashootan Vaezipoor · Toryn Klassen · Scott Sanner · Sheila McIlraith -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2022 Poster: Dataset Distillation using Neural Feature Regression »
Yongchao Zhou · Ehsan Nezhadarya · Jimmy Ba -
2021 Poster: Clockwork Variational Autoencoders »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2021 Poster: Learning Domain Invariant Representations in Goal-conditioned Block MDPs »
Beining Han · Chongyi Zheng · Harris Chan · Keiran Paster · Michael Zhang · Jimmy Ba -
2021 Poster: How does a Neural Network's Architecture Impact its Robustness to Noisy Labels? »
Jingling Li · Mozhi Zhang · Keyulu Xu · John Dickerson · Jimmy Ba -
2020 : Contributed Talk #2: Evaluating Agents Without Rewards »
Brendon Matusch · Danijar Hafner · Jimmy Ba -
2020 : Contributed Talk: Planning from Pixels using Inverse Dynamics Models »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2020 Session: Orals & Spotlights Track 34: Deep Learning »
Tuo Zhao · Jimmy Ba -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Graph Normalizing Flows »
Jenny Liu · Aviral Kumar · Jimmy Ba · Jamie Kiros · Kevin Swersky -
2019 Poster: Learning Reward Machines for Partially Observable Reinforcement Learning »
Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith -
2019 Spotlight: Learning Reward Machines for Partially Observable Reinforcement Learning »
Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith -
2018 : Poster Session »
Carl Trimbach · Mennatullah Siam · Rodrigo Toro Icarte · Zhongtian Dai · Sheila McIlraith · Matthew Rahtz · Robert Sheline · Christopher MacLellan · Carolin Lawrence · Stefan Riezler · Dylan Hadfield-Menell · Fang-I Hsiao -
2018 : Teaching Multiple Tasks to an RL Agent using LTL »
Rodrigo Toro Icarte · Sheila McIlraith -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse