Timezone: »

 
Off-Policy Correction For Multi-Agent Reinforcement Learning
Michał Zawalski · Błażej Osiński · Henryk Michalewski · Piotr Miłoś
Event URL: https://openreview.net/forum?id=E9q1Y_aWDc »

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded -- we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

Author Information

Michał Zawalski (University of Warsaw)
Błażej Osiński (University of Warsaw)
Henryk Michalewski (University of Warsaw, Google)
Piotr Miłoś (Polish Academy of Sciences, University of Oxford)

More from the Same Authors