Timezone: »

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality
Stefanos Leonardos · Georgios Piliouras · Kelly Spendlove

Thu Dec 09 08:30 AM -- 10:00 AM (PST) @ Virtual

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games [16,34], we show that fast convergence of Q-learning in competitive settings obtains regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

Author Information

Stefanos Leonardos (Singapore University of Technology and Design)
Georgios Piliouras (Singapore University of Technology and Design)
Kelly Spendlove (University of Oxford)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors