Timezone: »
It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals’ utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal’s utility should evolve over time according to how well the agent’s observations conform with that principal’s prior. To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.
Author Information
Nishant Desai (DeepScale)
Andrew Critch (UC Berkeley)
Stuart J Russell (UC Berkeley)
More from the Same Authors
-
2021 Spotlight: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2022 : Adversarial Policies Beat Professional-Level Go AIs »
Tony Wang · Adam Gleave · Nora Belrose · Tom Tseng · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Joseph Miller · Sergey Levine · Stuart J Russell -
2021 : V&S | Panel discussion »
Michael Dennis · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques -
2021 : V&S | RL Fictions »
Stuart J Russell -
2021 Workshop: Political Economy of Reinforcement Learning Systems (PERLS) »
Thomas Gilbert · Stuart J Russell · Tom O Zick · Aaron Snoswell · Michael Dennis -
2021 Poster: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2020 Poster: The MAGICAL Benchmark for Robust Imitation »
Sam Toyer · Rohin Shah · Andrew Critch · Stuart Russell -
2020 Poster: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2020 Oral: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2017 Poster: Inverse Reward Design »
Dylan Hadfield-Menell · Smitha Milli · Pieter Abbeel · Stuart J Russell · Anca Dragan -
2017 Oral: Inverse Reward Design »
Dylan Hadfield-Menell · Smitha Milli · Pieter Abbeel · Stuart J Russell · Anca Dragan -
2016 Poster: Cooperative Inverse Reinforcement Learning »
Dylan Hadfield-Menell · Stuart J Russell · Pieter Abbeel · Anca Dragan -
2015 Poster: Gaussian Process Random Fields »
Dave Moore · Stuart J Russell -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden -
2014 Poster: Algorithm selection by rational metareasoning as a model of human strategy selection »
Falk Lieder · Dillon Plunkett · Jessica B Hamrick · Stuart J Russell · Nicholas Hay · Tom Griffiths -
2013 Poster: Multilinear Dynamical Systems for Tensor Time Series »
Mark Rogers · Lei Li · Stuart J Russell -
2010 Poster: Global seismic monitoring as probabilistic inference »
Nimar Arora · Stuart J Russell · Paul Kidwell · Erik Sudderth -
2008 Poster: Probabilistic detection of short events, with application to critical care monitoring »
Norm Aleks · Stuart J Russell · Michael G Madden · Diane Morabito · Geoffrey T Manley · Kristan Staudenmayer · Mitchell Cohen