`

Timezone: »

 
Poster
Reward learning from human preferences and demonstrations in Atari
Borja Ibarz · Jan Leike · Tobias Pohlen · Geoffrey Irving · Shane Legg · Dario Amodei

Wed Dec 05 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #139

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we need humans to communicate an objective to the agent directly. In this work, we combine two approaches to this problem: learning from expert demonstrations and learning from trajectory preferences. We use both to train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games. Additionally, we investigate the fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

Author Information

Borja Ibarz (DeepMind)
Jan Leike (DeepMind)
Toby Pohlen (DeepMind)
Geoffrey Irving (OpenAI)
Shane Legg (DeepMind)
Dario Amodei (OpenAI)

More from the Same Authors

  • 2020 Workshop: Cooperative AI »
    Thore Graepel · Dario Amodei · Vincent Conitzer · Allan Dafoe · Gillian Hadfield · Eric Horvitz · Sarit Kraus · Kate Larson · Yoram Bachrach
  • 2020 Poster: Meta-trained agents implement Bayes-optimal agents »
    Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega
  • 2020 Poster: Avoiding Side Effects By Considering Future Tasks »
    Victoria Krakovna · Laurent Orseau · Richard Ngo · Miljan Martic · Shane Legg
  • 2020 Spotlight: Meta-trained agents implement Bayes-optimal agents »
    Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega
  • 2020 Poster: Learning to summarize with human feedback »
    Nisan Stiennon · Long Ouyang · Jeffrey Wu · Daniel Ziegler · Ryan Lowe · Chelsea Voss · Alec Radford · Dario Amodei · Paul Christiano
  • 2020 Poster: Language Models are Few-Shot Learners »
    Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared D Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei
  • 2020 Oral: Language Models are Few-Shot Learners »
    Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared D Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei
  • 2017 Poster: Deep Reinforcement Learning from Human Preferences »
    Paul Christiano · Jan Leike · Tom Brown · Miljan Martic · Shane Legg · Dario Amodei
  • 2007 Poster: Temporal Difference with Eligibility Traces Derived from First Principles »
    Marcus Hutter · Shane Legg