`

Timezone: »

 
Poster
Playing hard exploration games by watching YouTube
Yusuf Aytar · Tobias Pfaff · David Budden · Thomas Paine · Ziyu Wang · Nando de Freitas

Wed Dec 05 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #142

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to learn a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

Author Information

Yusuf Aytar (DeepMind)
Tobias Pfaff (DeepMind)
David Budden (DeepMind)
Thomas Paine (DeepMind)
Ziyu Wang (Deepmind)
Nando de Freitas (DeepMind)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors

  • 2020 : Learning Mesh-Based Simulation with Graph Networks »
    Tobias Pfaff · Meire Fortunato · Alvaro Sanchez Gonzalez · Peter Battaglia
  • 2021 : Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies »
    Dushyant Rao · Fereshteh Sadeghi · Leonard Hasenclever · Markus Wulfmeier · Martina Zambelli · Giulia Vezzani · Dhruva Tirumala · Yusuf Aytar · Josh Merel · Nicolas Heess · Raia Hadsell
  • 2021 : Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation »
    Todor Davchev · Oleg Sushkov · Jean-Baptiste Regli · Stefan Schaal · Yusuf Aytar · Markus Wulfmeier · Jonathan Scholz
  • 2021 : Retrospective Panel »
    Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal
  • 2021 Poster: Active Offline Policy Selection »
    Ksenia Konyushova · Yutian Chen · Thomas Paine · Caglar Gulcehre · Cosmin Paduraru · Daniel Mankowitz · Misha Denil · Nando de Freitas
  • 2020 : Panel »
    Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar
  • 2020 : Offline RL »
    Nando de Freitas
  • 2020 Poster: Critic Regularized Regression »
    Ziyu Wang · Alexander Novikov · Konrad Zolna · Josh Merel · Jost Tobias Springenberg · Scott Reed · Bobak Shahriari · Noah Siegel · Caglar Gulcehre · Nicolas Heess · Nando de Freitas
  • 2020 Poster: Modular Meta-Learning with Shrinkage »
    Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas
  • 2020 Spotlight: Modular Meta-Learning with Shrinkage »
    Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas
  • 2020 Poster: A Combinatorial Perspective on Transfer Learning »
    Jianan Wang · Eren Sezener · David Budden · Marcus Hutter · Joel Veness
  • 2020 Poster: RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning »
    Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas
  • 2020 Poster: Online Learning in Contextual Bandits using Gated Linear Networks »
    Eren Sezener · Marcus Hutter · David Budden · Jianan Wang · Joel Veness
  • 2020 Poster: Gaussian Gated Linear Networks »
    David Budden · Adam Marblestone · Eren Sezener · Tor Lattimore · Gregory Wayne · Joel Veness
  • 2019 : Poster Session »
    Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn
  • 2019 Workshop: Science meets Engineering of Deep Learning »
    Levent Sagun · Caglar Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas
  • 2019 : Welcoming remarks and introduction »
    Levent Sagun · Caglar Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas
  • 2019 Poster: Learning Compositional Neural Programs with Recursive Tree Search and Planning »
    Thomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas
  • 2019 Spotlight: Learning Compositional Neural Programs with Recursive Tree Search and Planning »
    Thomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas
  • 2018 : TBA 5 »
    Nando de Freitas
  • 2018 : Invited Talk 5: Nando de Freitas »
    Nando de Freitas
  • 2018 : Poster Session 1 + Coffee »
    Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang
  • 2017 Poster: Robust Imitation of Diverse Behaviors »
    Ziyu Wang · Josh Merel · Scott Reed · Nando de Freitas · Gregory Wayne · Nicolas Heess
  • 2017 Tutorial: Deep Learning: Practice and Trends »
    Nando de Freitas · Scott Reed · Oriol Vinyals
  • 2016 Workshop: Neural Abstract Machines & Program Induction »
    Matko Bošnjak · Nando de Freitas · Tejas Kulkarni · Arvind Neelakantan · Scott E Reed · Sebastian Riedel · Tim Rocktäschel
  • 2016 : Nando De Freitas »
    Nando de Freitas
  • 2016 : Learning To Optimize »
    Nando de Freitas
  • 2016 Poster: Learning to learn by gradient descent by gradient descent »
    Marcin Andrychowicz · Misha Denil · Sergio Gómez · Matthew Hoffman · David Pfau · Tom Schaul · Nando de Freitas
  • 2015 Workshop: Bayesian Optimization: Scalability and Flexibility »
    Bobak Shahriari · Ryan Adams · Nando de Freitas · Amar Shah · Roberto Calandra