Timezone: »

 
Poster
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
Sriram Srinivasan · Marc Lanctot · Vinicius Zambaldi · Julien Perolat · Karl Tuyls · Remi Munos · Michael Bowling

Wed Dec 05 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #158

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero-sum games, without any domain-specific state space reductions.

Author Information

Sriram Srinivasan (Google)
Marc Lanctot (DeepMind)
Vinicius Zambaldi (Deepmind)
Julien Perolat (DeepMind)
Karl Tuyls (DeepMind)
Remi Munos (DeepMind)
Michael Bowling (DeepMind / University of Alberta)

More from the Same Authors