`

Timezone: »

 
Poster
Deep Reinforcement Learning from Human Preferences
Paul Christiano · Jan Leike · Tom Brown · Miljan Martic · Shane Legg · Dario Amodei

Wed Dec 06 06:30 PM -- 10:30 PM (PST) @ Pacific Ballroom #1 #None

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. Our approach separates learning the goal from learning the behavior to achieve it. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on about 0.1% of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any which have been previously learned from human feedback.

Author Information

Paul Christiano (OpenAI)
Jan Leike (DeepMind)
Tom Brown (Google Brain)
Miljan Martic (DeepMind)
Shane Legg (DeepMind)
Dario Amodei (OpenAI)

More from the Same Authors

  • 2020 Workshop: Cooperative AI »
    Thore Graepel · Dario Amodei · Vincent Conitzer · Allan Dafoe · Gillian Hadfield · Eric Horvitz · Sarit Kraus · Kate Larson · Yoram Bachrach
  • 2020 Poster: Meta-trained agents implement Bayes-optimal agents »
    Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega
  • 2020 Poster: Avoiding Side Effects By Considering Future Tasks »
    Victoria Krakovna · Laurent Orseau · Richard Ngo · Miljan Martic · Shane Legg
  • 2020 Spotlight: Meta-trained agents implement Bayes-optimal agents »
    Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega
  • 2020 Poster: Learning to summarize with human feedback »
    Nisan Stiennon · Long Ouyang · Jeffrey Wu · Daniel Ziegler · Ryan Lowe · Chelsea Voss · Alec Radford · Dario Amodei · Paul Christiano
  • 2020 Poster: Language Models are Few-Shot Learners »
    Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared D Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei
  • 2020 Oral: Language Models are Few-Shot Learners »
    Tom B Brown · Benjamin Mann · Nick Ryder · Melanie Subbiah · Jared D Kaplan · Prafulla Dhariwal · Arvind Neelakantan · Pranav Shyam · Girish Sastry · Amanda Askell · Sandhini Agarwal · Ariel Herbert-Voss · Gretchen M Krueger · Tom Henighan · Rewon Child · Aditya Ramesh · Daniel Ziegler · Jeffrey Wu · Clemens Winter · Chris Hesse · Mark Chen · Eric Sigler · Mateusz Litwin · Scott Gray · Benjamin Chess · Jack Clark · Christopher Berner · Sam McCandlish · Alec Radford · Ilya Sutskever · Dario Amodei
  • 2018 Poster: Reward learning from human preferences and demonstrations in Atari »
    Borja Ibarz · Jan Leike · Tobias Pohlen · Geoffrey Irving · Shane Legg · Dario Amodei
  • 2007 Poster: Temporal Difference with Eligibility Traces Derived from First Principles »
    Marcus Hutter · Shane Legg