Skip to yearly menu bar Skip to main content


Poster

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

Nishant Desai · Andrew Critch · Stuart J Russell

Room 517 AB #164

Keywords: [ Multi-Agent RL ] [ Game Theory and Computational Economics ] [ Decision and Control ] [ Markov Decision Processes ]


Abstract:

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e. a policy that cannot be improved upon for one principal without making sacrifices for another. Harsanyi's theorem shows that when the principals have a common prior on the outcome distributions of all policies, a Pareto optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals’ utilities. In this paper, we derive a more precise generalization for the sequential decision setting in the case of principals with different priors on the dynamics of the environment. We refer to this generalization as the Negotiable Reinforcement Learning (NRL) framework. In this more general case, the relative weight given to each principal’s utility should evolve over time according to how well the agent’s observations conform with that principal’s prior. To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.

Live content is unavailable. Log in and register to view live content