Timezone: »
Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.
Author Information
Max Schwarzer (Mila, Université de Montréal)
Nitarshan Rajkumar (Mila, Université de Montréal)
Michael Noukhovitch (Mila (Université de Montréal))
Master's student at MILA supervised by Aaron Courville and co-supervised by Yoshua Bengio
Ankesh Anand (Mila, University of Montreal)
Laurent Charlin (MILA / U.Montreal)
R Devon Hjelm (Microsoft Research)
Philip Bachman (Microsoft Research)
Aaron Courville (U. Montreal)
More from the Same Authors
-
2021 Spotlight: A Variational Perspective on Diffusion-Based Generative Models and Score Matching »
Chin-Wei Huang · Jae Hyun Lim · Aaron Courville -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : Behavior Predictive Representations for Generalization in Reinforcement Learning »
Siddhant Agarwal · Aaron Courville · Rishabh Agarwal -
2021 : MIDI-DDSP: Hierarchical Modeling of Music for Detailed Control »
Yusong Wu · Ethan Manilow · Kyle Kastner · Tim Cooijmans · Aaron Courville · Cheng-Zhi Anna Huang · Jesse Engel -
2022 : Attention for Compositional Modularity »
Oleksiy Ostapenko · Pau Rodriguez · Alexandre Lacoste · Laurent Charlin -
2022 : Datasets That Are Not: Evolving Novelty Through Sparsity and Iterated Learning »
Yusong Wu · Kyle Kastner · Tim Cooijmans · Cheng-Zhi Anna Huang · Aaron Courville -
2022 : Unleashing The Potential of Data Sharing in Ensemble Deep Reinforcement Learning »
Zhixuan Lin · Pierluca D'Oro · Evgenii Nikishin · Aaron Courville -
2022 : Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier »
Pierluca D'Oro · Max Schwarzer · Evgenii Nikishin · Pierre-Luc Bacon · Marc Bellemare · Aaron Courville -
2022 : Investigating Multi-task Pretraining and Generalization in Reinforcement Learning »
Adrien Ali Taiga · Rishabh Agarwal · Jesse Farebrother · Aaron Courville · Marc Bellemare -
2022 Poster: Riemannian Diffusion Models »
Chin-Wei Huang · Milad Aghajohari · Joey Bose · Prakash Panangaden · Aaron Courville -
2022 Poster: Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2022 Poster: Myriad: a real-world testbed to bridge trajectory optimization and deep learning »
Nikolaus Howe · Simon Dufort-Labbé · Nitarshan Rajkumar · Pierre-Luc Bacon -
2021 : Behavior Predictive Representations for Generalization in Reinforcement Learning »
Siddhant Agarwal · Aaron Courville · Rishabh Agarwal -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Q&A »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : Machine Learning for Combinatorial Optimization + Q&A »
Maxime Gasse · Simon Bowly · Chris Cameron · Quentin Cappart · Jonas Charfreitag · Laurent Charlin · Shipra Agrawal · Didier Chetelat · Justin Dumouchelle · Ambros Gleixner · Aleksandr Kazachkov · Elias Khalil · Pawel Lichocki · Andrea Lodi · Miles Lubin · Christopher Morris · Dimitri Papageorgiou · Augustin Parjadis · Sebastian Pokutta · Antoine Prouvost · Yuandong Tian · Lara Scavuzzo · Giulia Zarpellon -
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 Poster: Continual Learning via Local Module Composition »
Oleksiy Ostapenko · Pau Rodriguez · Massimo Caccia · Laurent Charlin -
2021 Poster: A Variational Perspective on Diffusion-Based Generative Models and Score Matching »
Chin-Wei Huang · Jae Hyun Lim · Aaron Courville -
2021 Oral: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2021 Poster: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2020 Workshop: Talking to Strangers: Zero-Shot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì -
2020 Workshop: AI for Earth Sciences »
Surya Karthik Mukkavilli · Johanna Hansen · Natasha Dudek · Tom Beucler · Kelly Kochanski · Mayur Mudigonda · Karthik Kashinath · Amy McGovern · Paul D Miller · Chad Frischmann · Pierre Gentine · Gregory Dudek · Aaron Courville · Daniel Kammen · Vipin Kumar -
2020 Poster: Unsupervised Learning of Dense Visual Representations »
Pedro O. Pinheiro · Amjad Almahairi · Ryan Benmalek · Florian Golemo · Aaron Courville -
2020 Poster: Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning »
Massimo Caccia · Pau Rodriguez · Oleksiy Ostapenko · Fabrice Normandin · Min Lin · Lucas Page-Caccia · Issam Hadj Laradji · Irina Rish · Alexandre Lacoste · David Vázquez · Laurent Charlin -
2020 Poster: Synbols: Probing Learning Algorithms with Synthetic Datasets »
Alexandre Lacoste · Pau Rodríguez López · Frederic Branchaud-Charron · Parmida Atighehchian · Massimo Caccia · Issam Hadj Laradji · Alexandre Drouin · Matthew Craddock · Laurent Charlin · David Vázquez -
2020 Session: Orals & Spotlights Track 16: Continual/Meta/Misc Learning »
Laurent Charlin · Cedric Archambeau -
2020 Poster: Deep Reinforcement and InfoMax Learning »
Bogdan Mazoure · Remi Tachet des Combes · Thang Long Doan · Philip Bachman · R Devon Hjelm -
2020 Poster: In search of robust measures of generalization »
Gintare Karolina Dziugaite · Alexandre Drouin · Brady Neal · Nitarshan Rajkumar · Ethan Caballero · Linbo Wang · Ioannis Mitliagkas · Daniel Roy -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: Ordered Memory »
Yikang Shen · Shawn Tan · Arian Hosseini · Zhouhan Lin · Alessandro Sordoni · Aaron Courville -
2019 Poster: Learning Representations by Maximizing Mutual Information Across Views »
Philip Bachman · R Devon Hjelm · William Buchwalter -
2019 Poster: Unsupervised State Representation Learning in Atari »
Ankesh Anand · Evan Racah · Sherjil Ozair · Yoshua Bengio · Marc-Alexandre Côté · R Devon Hjelm -
2019 Poster: Online Continual Learning with Maximal Interfered Retrieval »
Rahaf Aljundi · Eugene Belilovsky · Tinne Tuytelaars · Laurent Charlin · Massimo Caccia · Min Lin · Lucas Page-Caccia -
2019 Poster: MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis »
Kundan Kumar · Rithesh Kumar · Thibault de Boissiere · Lucas Gestin · Wei Zhen Teoh · Jose Sotelo · Alexandre de Brébisson · Yoshua Bengio · Aaron Courville -
2019 Poster: No-Press Diplomacy: Modeling Multi-Agent Gameplay »
Philip Paquette · Yuchen Lu · SETON STEVEN BOCCO · Max Smith · Satya O.-G. · Jonathan K. Kummerfeld · Joelle Pineau · Satinder Singh · Aaron Courville -
2019 Poster: On Adversarial Mixup Resynthesis »
Christopher Beckham · Sina Honari · Alex Lamb · Vikas Verma · Farnoosh Ghadiri · R Devon Hjelm · Yoshua Bengio · Chris Pal -
2019 Poster: Exact Combinatorial Optimization with Graph Convolutional Neural Networks »
Maxime Gasse · Didier Chetelat · Nicola Ferroni · Laurent Charlin · Andrea Lodi -
2018 : Spotlight Talks I »
Juan Leni · Michael Spranger · Ben Bogin · Shane Steinert-Threlkeld · Nicholas Tomlin · Fushan Li · Michael Noukhovitch · Tushar Jain · Jason Lee · Yen-Ling Kuo · Josefina Correa · Karol Hausman -
2018 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Erik Wijmans · Samyak Datta · Ethan Perez · Mateusz Malinowski · Stefan Lee · Peter Anderson · Aaron Courville · Jeremie MARY · Dhruv Batra · Devi Parikh · Olivier Pietquin · Chiori HORI · Tim Marks · Anoop Cherian -
2018 Poster: Towards Deep Conversational Recommendations »
Raymond Li · Samira Ebrahimi Kahou · Hannes Schulz · Vincent Michalski · Laurent Charlin · Chris Pal -
2018 Poster: Improving Explorability in Variational Inference with Annealed Variational Objectives »
Chin-Wei Huang · Shawn Tan · Alexandre Lacoste · Aaron Courville -
2018 Poster: Towards Text Generation with Adversarially Learned Neural Outlines »
Sandeep Subramanian · Sai Rajeswar Mudumba · Alessandro Sordoni · Adam Trischler · Aaron Courville · Chris Pal -
2017 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary -
2017 Poster: Improved Training of Wasserstein GANs »
Ishaan Gulrajani · Faruk Ahmed · Martin Arjovsky · Vincent Dumoulin · Aaron Courville -
2017 Poster: GibbsNet: Iterative Adversarial Inference for Deep Graphical Models »
Alex Lamb · R Devon Hjelm · Yaroslav Ganin · Joseph Paul Cohen · Aaron Courville · Yoshua Bengio -
2017 Poster: Modulating early visual processing by language »
Harm de Vries · Florian Strub · Jeremie Mary · Hugo Larochelle · Olivier Pietquin · Aaron Courville -
2017 Spotlight: Modulating early visual processing by language »
Harm de Vries · Florian Strub · Jeremie Mary · Hugo Larochelle · Olivier Pietquin · Aaron Courville -
2016 : Discussion panel »
Ian Goodfellow · Soumith Chintala · Arthur Gretton · Sebastian Nowozin · Aaron Courville · Yann LeCun · Emily Denton -
2016 : Adversarially Learned Inference (ALI) and BiGANs »
Aaron Courville -
2016 Poster: Professor Forcing: A New Algorithm for Training Recurrent Networks »
Alex M Lamb · Anirudh Goyal · Ying Zhang · Saizheng Zhang · Aaron Courville · Yoshua Bengio -
2016 Poster: An Architecture for Deep, Hierarchical Generative Models »
Philip Bachman -
2015 : Introduction »
Aaron Courville -
2015 Workshop: Multimodal Machine Learning »
Louis-Philippe Morency · Tadas Baltrusaitis · Aaron Courville · Kyunghyun Cho -
2015 Poster: Data Generation as Sequential Decision Making »
Philip Bachman · Doina Precup -
2015 Spotlight: Data Generation as Sequential Decision Making »
Philip Bachman · Doina Precup -
2015 Poster: A Recurrent Latent Variable Model for Sequential Data »
Junyoung Chung · Kyle Kastner · Laurent Dinh · Kratarth Goel · Aaron Courville · Yoshua Bengio -
2014 Poster: Learning with Pseudo-Ensembles »
Philip Bachman · Ouais Alsharif · Doina Precup -
2014 Poster: Content-based recommendations with Poisson factorization »
Prem Gopalan · Laurent Charlin · David Blei -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2011 Poster: On Tracking The Partition Function »
Guillaume Desjardins · Aaron Courville · Yoshua Bengio -
2009 Poster: An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism »
Aaron Courville · Douglas Eck · Yoshua Bengio -
2009 Session: Oral Session 3: Deep Learning and Network Models »
Aaron Courville -
2008 Session: Oral session 11: Attention and Mind »
Aaron Courville -
2007 Spotlight: The rat as particle filter »
Nathaniel D Daw · Aaron Courville -
2007 Poster: The rat as particle filter »
Nathaniel D Daw · Aaron Courville -
2006 Poster: Automated Hierarchy Discovery for Planning in Partially Observable Domains »
Laurent Charlin · Pascal Poupart · Romy Shioda