Timezone: »
The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games [15,34], we show that fast convergence of Q-learning in competitive settings obtains regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.
Author Information
Stefanos Leonardos (Singapore University of Technology and Design)
Kelly Spendlove (University of Oxford)
Georgios Piliouras (Singapore University of Technology and Design)
More from the Same Authors
-
2021 Spotlight: Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality »
Stefanos Leonardos · Georgios Piliouras · Kelly Spendlove -
2021 : Learning in Matrix Games can be Arbitrarily Complex »
Gabriel Andrade · Rafael Frongillo · Georgios Piliouras -
2021 : Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games »
Stefanos Leonardos · Will Overman · Ioannis Panageas · Georgios Piliouras -
2021 : Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality »
Stefanos Leonardos · Kelly Spendlove · Georgios Piliouras -
2021 : Learning in Matrix Games can be Arbitrarily Complex »
Gabriel Andrade · Rafael Frongillo · Georgios Piliouras -
2021 : Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games »
Stefanos Leonardos · Will Overman · Ioannis Panageas · Georgios Piliouras -
2023 Poster: The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium »
Shuyue Hu · Harold Soh · Georgios Piliouras -
2023 Poster: Alternation makes the adversary weaker in two-player games »
Volkan Cevher · Ashok Cutkosky · Ali Kavis · Georgios Piliouras · Stratis Skoulakis · Luca Viano -
2023 Poster: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation »
Daiki E Matsunaga · Jongmin Lee · Jaeseok Yoon · Stefanos Leonardos · Pieter Abbeel · Kee-Eung Kim -
2023 Poster: Exploiting hidden structures in non-convex games for convergence to Nash equilibrium »
Iosif Sakos · Emmanouil-Vasileios Vlatakis-Gkaragkounis · Panayotis Mertikopoulos · Georgios Piliouras -
2022 Poster: Alternating Mirror Descent for Constrained Min-Max Games »
Andre Wibisono · Molei Tao · Georgios Piliouras -
2022 Poster: Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update »
Georgios Piliouras · Ryann Sim · Stratis Skoulakis -
2022 Poster: Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence »
Rahul Jain · Georgios Piliouras · Ryann Sim -
2021 Poster: Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent »
Emmanouil-Vasileios Vlatakis-Gkaragkounis · Lampros Flokas · Georgios Piliouras -
2021 Poster: Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality »
Stefanos Leonardos · Georgios Piliouras · Kelly Spendlove -
2021 Poster: Online Learning in Periodic Zero-Sum Games »
Tanner Fiez · Ryann Sim · Stratis Skoulakis · Georgios Piliouras · Lillian Ratliff -
2019 Poster: First-order methods almost always avoid saddle points: The case of vanishing step-sizes »
Ioannis Panageas · Georgios Piliouras · Xiao Wang -
2019 Poster: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2019 Spotlight: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2017 Poster: Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos »
Gerasimos Palaiopanos · Ioannis Panageas · Georgios Piliouras -
2017 Spotlight: Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos »
Gerasimos Palaiopanos · Ioannis Panageas · Georgios Piliouras