Timezone: »

Regularized Softmax Deep Multi-Agent Q-Learning
Ling Pan · Tabish Rashid · Bei Peng · Longbo Huang · Shimon Whiteson

Wed Dec 08 12:30 AM -- 02:00 AM (PST) @
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular $Q$-learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent $Q$-Learning, is general and can be applied to any $Q$-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

Author Information

Ling Pan (Tsinghua University)
Tabish Rashid (University of Oxford)
Bei Peng (University of Liverpool)
Longbo Huang (IIIS, Tsinghua Univeristy)
Shimon Whiteson (University of Oxford)

More from the Same Authors