Timezone: »

 
The Evolutionary Dynamics of Soft-Max PolicyGradient in Multi-Agent Settings
Martino Bernasconi · Federico Cacciamani · Simone Fioravanti · Nicola Gatti · Francesco Trovò
Event URL: https://openreview.net/forum?id=N6zUOmM4ihO »

Policy gradient is one of the most famous algorithms in reinforcement learning. In this paper, we derive the mean dynamics of the soft-max policy gradient algorithm in multi-agent settings by resorting to evolutionary game theory tools. Studying its dynamics is crucial to understand the algorithm's weaknesses and suggest how to recover from them. Unlike most multi-agent reinforcement learning algorithms, whose mean dynamics are slight variants of the replicator dynamics not affecting the properties of the original dynamics, the soft-max policy gradient dynamics present a different structure. However, they preserve a close connection with replicator dynamics, being a replicator dynamics applied to a non-linear transformation of the fitness function. We separately analyze the dynamics when learning the best response from the cases of single- and multi-population games. In particular, we show that the soft-max policy gradient dynamics always converge to the best response. However, differently from the replicator dynamics, they always suffer from a non-empty space of bad initializations from which the convergence of the dynamics to the best response is not monotonic. Furthermore, in single- and multi-population games, we show that the soft-max policy gradient dynamics satisfy a weaker set of properties than those satisfied by replicator dynamics.

Author Information

Martino Bernasconi (Politecnico di Milano)
Federico Cacciamani (Politecnico di Milano)
Simone Fioravanti (Gran Sasso Science Institute (GSSI))
Nicola Gatti (Politecnico di Milano)
Francesco Trovò (Politecnico di Milano)

More from the Same Authors