Timezone: »

 
Specification-Guided Learning of Nash Equilibria with High Social Welfare
Kishor Jothimurugan · Suguman Bansal · Osbert Bastani · Rajeev Alur
Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the goal of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an $\epsilon$-Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.

Author Information

Kishor Jothimurugan (University of Pennsylvania)
Suguman Bansal (University of Pennsylvania)

Suguman Bansal is a postdoctoral researcher in the Department of Computer and Information Sciences at the University of Pennsylvania. Her research interests lie at the intersection of Artificial Intelligence and Programming Languages. Specifically, she works on developing tools and techniques to improve the quality of automated verification and synthesis of computational systems. Her recent work concerns providing formal guarantees about learning-enabled systems with a focus on Reinforcement Learning. She received her Ph.D. (2020) and M.S. (2016) in Computer Science from Rice University, and B.S. (with Honors) degree (2014) in Mathematics and Computer Science from Chennai Mathematical Institute. She is the recipient of the NSF/CRA Computing Innovation Fellow 2020, EECS Rising Stars 2018, and Andrew Ladd Fellowship 2016, among others.

Osbert Bastani (University of Pennsylvania)
Rajeev Alur (University of Pennsylvania)

More from the Same Authors