Poster
Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem
Fivos Kalogiannis · Jingming Yan · Ioannis Panageas
West Ballroom A-D #6910
[
Abstract
]
Thu 12 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Abstract:
We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, called *the adversary*. These games unify two-player zero-sum Markov games and Markov potential games, resulting in a setting that encompasses both collaboration and competition. Kalogiannis et al. (2023) provided an efficient equilibrium computation algorithm for ATMGs which presumes knowledge of the reward and transition functions and has no sample complexity guarantees. Our contribution is the design of a learning algorithm that utilizes MARL policy gradient methods with iteration and sample complexity that is polynomial in the approximation error $\epsilon$ and other natural parameters of the ATMG, answering the main open question from (Kalogiannis et al., 2023). It is worth noting prior to our work, the existence of learning algorithms for NE was known for Markov two-player zero-sum and potential games but not for ATMGs. Seen through the lens of min-max optimization, computing a NE in these games consist a nonconvex--nonconcave saddle-point problem. Min-max optimization has received an extensive study in the last years. Nevertheless, the case of nonconvex--nonconcave landscapes remains elusive: it has been established that in full generality finding saddle-points is computationally intractable (Daskalakis et al., 2021). We circumvent the aforementioned tractability issues by developing techniques that exploit the hidden structure of the objective function via a nonconvex--concave reformulation. However, this reformulation introduces a feasibility set with coupled constraints that poses additional challenges. We manage to tackle these challenges by establishing novel techniques on optimizing weakly-smooth nonconvex functions, extending the framework of (Devolder et al., 2014). We believe that our techniques can be used for learning or computing saddle-points in other nonconvex--nonconcave settings beyond ATMGs and can be of independent interest.
Live content is unavailable. Log in and register to view live content