Timezone: »
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.
Author Information
Rishabh Agarwal (Google Research, Brain Team)
Max Schwarzer (Mila, Université de Montréal)
Pablo Samuel Castro (Google)
Aaron Courville (U. Montreal)
Marc Bellemare (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Tue. Dec 7th 04:30 -- 06:00 PM Room Virtual
More from the Same Authors
-
2020 : GANterpretations »
Pablo Samuel Castro -
2021 Spotlight: Neural Additive Models: Interpretable Machine Learning with Neural Nets »
Rishabh Agarwal · Levi Melnick · Nicholas Frosst · Xuezhou Zhang · Ben Lengerich · Rich Caruana · Geoffrey Hinton -
2021 Spotlight: A Variational Perspective on Diffusion-Based Generative Models and Score Matching »
Chin-Wei Huang · Jae Hyun Lim · Aaron Courville -
2021 : Lifting the veil on hyper-parameters for value-baseddeep reinforcement learning »
João Madeira Araújo · Johan Obando Ceron · Pablo Samuel Castro -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : Lifting the veil on hyper-parameters for value-baseddeep reinforcement learning »
João Madeira Araújo · Johan Obando Ceron · Pablo Samuel Castro -
2021 : Behavior Predictive Representations for Generalization in Reinforcement Learning »
Siddhant Agarwal · Aaron Courville · Rishabh Agarwal -
2021 : MIDI-DDSP: Hierarchical Modeling of Music for Detailed Control »
Yusong Wu · Ethan Manilow · Kyle Kastner · Tim Cooijmans · Aaron Courville · Cheng-Zhi Anna Huang · Jesse Engel -
2022 : A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces »
Charline Le Lan · Joshua Greaves · Jesse Farebrother · Mark Rowland · Fabian Pedregosa · Rishabh Agarwal · Marc Bellemare -
2022 : Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks »
Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare -
2022 : Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks »
Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare -
2022 : Datasets That Are Not: Evolving Novelty Through Sparsity and Iterated Learning »
Yusong Wu · Kyle Kastner · Tim Cooijmans · Cheng-Zhi Anna Huang · Aaron Courville -
2022 : Unleashing The Potential of Data Sharing in Ensemble Deep Reinforcement Learning »
Zhixuan Lin · Pierluca D'Oro · Evgenii Nikishin · Aaron Courville -
2022 : Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks »
Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare -
2022 : Variance Double-Down: The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning »
Johan Obando Ceron · Marc Bellemare · Pablo Samuel Castro -
2022 : Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier »
Pierluca D'Oro · Max Schwarzer · Evgenii Nikishin · Pierre-Luc Bacon · Marc Bellemare · Aaron Courville -
2022 : Investigating Multi-task Pretraining and Generalization in Reinforcement Learning »
Adrien Ali Taiga · Rishabh Agarwal · Jesse Farebrother · Aaron Courville · Marc Bellemare -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Spotlight: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning »
Yunhao Tang · Remi Munos · Mark Rowland · Bernardo Avila Pires · Will Dabney · Marc Bellemare -
2022 : Panel RL Benchmarks »
Minmin Chen · Pablo Samuel Castro · Caglar Gulcehre · Tony Jebara · Peter Stone -
2022 Workshop: Broadening Research Collaborations »
Sara Hooker · Rosanne Liu · Pablo Samuel Castro · FatemehSadat Mireshghallah · Sunipa Dev · Benjamin Rosman · João Madeira Araújo · Savannah Thais · Sara Hooker · Sunny Sanyal · Tejumade Afonja · Swapneel Mehta · Tyler Zhu -
2022 Poster: Riemannian Diffusion Models »
Chin-Wei Huang · Milad Aghajohari · Joey Bose · Prakash Panangaden · Aaron Courville -
2022 Poster: Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2022 Poster: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning »
Yunhao Tang · Remi Munos · Mark Rowland · Bernardo Avila Pires · Will Dabney · Marc Bellemare -
2021 : Behavior Predictive Representations for Generalization in Reinforcement Learning »
Siddhant Agarwal · Aaron Courville · Rishabh Agarwal -
2021 : Invited Talk: Pablo Castro (Google Brain) on Estimating Policy Functions in Payment Systems using Reinforcement Learning »
Pablo Samuel Castro -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Q&A »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 Poster: Pretraining Representations for Data-Efficient Reinforcement Learning »
Max Schwarzer · Nitarshan Rajkumar · Michael Noukhovitch · Ankesh Anand · Laurent Charlin · R Devon Hjelm · Philip Bachman · Aaron Courville -
2021 Poster: A Variational Perspective on Diffusion-Based Generative Models and Score Matching »
Chin-Wei Huang · Jae Hyun Lim · Aaron Courville -
2021 Poster: Neural Additive Models: Interpretable Machine Learning with Neural Nets »
Rishabh Agarwal · Levi Melnick · Nicholas Frosst · Xuezhou Zhang · Ben Lengerich · Rich Caruana · Geoffrey Hinton -
2021 : Lifting the veil on hyper-parameters for value-baseddeep reinforcement learning »
João Madeira Araújo · Johan Obando Ceron · Pablo Samuel Castro -
2021 Poster: The Difficulty of Passive Learning in Deep Reinforcement Learning »
Georg Ostrovski · Pablo Samuel Castro · Will Dabney -
2021 Poster: MICo: Improved representations via sampling-based state similarity for Markov decision processes »
Pablo Samuel Castro · Tyler Kastner · Prakash Panangaden · Mark Rowland -
2020 : Contributed Talk #3: Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning »
Rishabh Agarwal · Marlos C. Machado · Pablo Samuel Castro · Marc Bellemare -
2020 Workshop: AI for Earth Sciences »
Surya Karthik Mukkavilli · Johanna Hansen · Natasha Dudek · Tom Beucler · Kelly Kochanski · Mayur Mudigonda · Karthik Kashinath · Amy McGovern · Paul D Miller · Chad Frischmann · Pierre Gentine · Gregory Dudek · Aaron Courville · Daniel Kammen · Vipin Kumar -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Invited talk: Marc Bellemare "Autonomous navigation of stratospheric balloons using reinforcement learning" »
Marc Bellemare -
2020 Poster: Unsupervised Learning of Dense Visual Representations »
Pedro O. Pinheiro · Amjad Almahairi · Ryan Benmalek · Florian Golemo · Aaron Courville -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: Ordered Memory »
Yikang Shen · Shawn Tan · Arian Hosseini · Zhouhan Lin · Alessandro Sordoni · Aaron Courville -
2019 Poster: MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis »
Kundan Kumar · Rithesh Kumar · Thibault de Boissiere · Lucas Gestin · Wei Zhen Teoh · Jose Sotelo · Alexandre de Brébisson · Yoshua Bengio · Aaron Courville -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: No-Press Diplomacy: Modeling Multi-Agent Gameplay »
Philip Paquette · Yuchen Lu · SETON STEVEN BOCCO · Max Smith · Satya O.-G. · Jonathan K. Kummerfeld · Joelle Pineau · Satinder Singh · Aaron Courville -
2018 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Erik Wijmans · Samyak Datta · Ethan Perez · Mateusz Malinowski · Stefan Lee · Peter Anderson · Aaron Courville · Jeremie MARY · Dhruv Batra · Devi Parikh · Olivier Pietquin · Chiori HORI · Tim Marks · Anoop Cherian -
2018 Poster: Improving Explorability in Variational Inference with Annealed Variational Objectives »
Chin-Wei Huang · Shawn Tan · Alexandre Lacoste · Aaron Courville -
2018 Poster: Towards Text Generation with Adversarially Learned Neural Outlines »
Sandeep Subramanian · Sai Rajeswar Mudumba · Alessandro Sordoni · Adam Trischler · Aaron Courville · Chris Pal -
2017 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary -
2017 Poster: Improved Training of Wasserstein GANs »
Ishaan Gulrajani · Faruk Ahmed · Martin Arjovsky · Vincent Dumoulin · Aaron Courville -
2017 Poster: GibbsNet: Iterative Adversarial Inference for Deep Graphical Models »
Alex Lamb · R Devon Hjelm · Yaroslav Ganin · Joseph Paul Cohen · Aaron Courville · Yoshua Bengio -
2017 Poster: Modulating early visual processing by language »
Harm de Vries · Florian Strub · Jeremie Mary · Hugo Larochelle · Olivier Pietquin · Aaron Courville -
2017 Spotlight: Modulating early visual processing by language »
Harm de Vries · Florian Strub · Jeremie Mary · Hugo Larochelle · Olivier Pietquin · Aaron Courville -
2016 : Discussion panel »
Ian Goodfellow · Soumith Chintala · Arthur Gretton · Sebastian Nowozin · Aaron Courville · Yann LeCun · Emily Denton -
2016 : Adversarially Learned Inference (ALI) and BiGANs »
Aaron Courville -
2016 Poster: Unifying Count-Based Exploration and Intrinsic Motivation »
Marc Bellemare · Sriram Srinivasan · Georg Ostrovski · Tom Schaul · David Saxton · Remi Munos -
2016 Poster: Professor Forcing: A New Algorithm for Training Recurrent Networks »
Alex M Lamb · Anirudh Goyal · Ying Zhang · Saizheng Zhang · Aaron Courville · Yoshua Bengio -
2016 Poster: Safe and Efficient Off-Policy Reinforcement Learning »
Remi Munos · Tom Stepleton · Anna Harutyunyan · Marc Bellemare -
2015 : Introduction »
Aaron Courville -
2015 Workshop: Multimodal Machine Learning »
Louis-Philippe Morency · Tadas Baltrusaitis · Aaron Courville · Kyunghyun Cho -
2015 Poster: A Recurrent Latent Variable Model for Sequential Data »
Junyoung Chung · Kyle Kastner · Laurent Dinh · Kratarth Goel · Aaron Courville · Yoshua Bengio -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2011 Poster: On Tracking The Partition Function »
Guillaume Desjardins · Aaron Courville · Yoshua Bengio -
2009 Poster: An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism »
Aaron Courville · Douglas Eck · Yoshua Bengio -
2009 Session: Oral Session 3: Deep Learning and Network Models »
Aaron Courville -
2008 Session: Oral session 11: Attention and Mind »
Aaron Courville -
2007 Spotlight: The rat as particle filter »
Nathaniel D Daw · Aaron Courville -
2007 Poster: The rat as particle filter »
Nathaniel D Daw · Aaron Courville