Timezone: »

Meta-RL for Multi-Agent RL: Learning to Adapt to Evolving Agents
Matthias Gerstgrasser · David Parkes
Event URL: https://openreview.net/forum?id=0toY1f8-Iq9 »

In Multi-Agent RL, agents learn and evolve together, and each agent has to interact with a changing set of other agents. While generally viewed as a problem of non-stationarity, we propose that this can be viewed as a Meta-RL problem. We demonstrate an approach for learning Stackelberg equilibria, a type of equilibrium that features a bi-level optimization problem, where the inner level is a "best-response" of one or more follower agents to an evolving leader agent. Various approaches have been proposed in the literature to implement this best-response, most often treating each leader policy and the learning problem it induces for the follower(s) as a separate instance.We propose that the problem can be viewed as a meta (reinforcement) learning problem: Learning to learn to best-respond to different leader behaviors, by leveraging commonality in the induced follower learning problems. We demonstrate an approach using contextual policies and show that it matches performance of existing approaches using significantly fewer environment samples in experiments. We discuss how more advanced meta-RL techniques could allow this to scale to richer domains.

Author Information

Matthias Gerstgrasser (Harvard University)
David Parkes (Harvard University)

David C. Parkes is Gordon McKay Professor of Computer Science in the School of Engineering and Applied Sciences at Harvard University. He was the recipient of the NSF Career Award, the Alfred P. Sloan Fellowship, the Thouron Scholarship and the Harvard University Roslyn Abramson Award for Teaching. Parkes received his Ph.D. degree in Computer and Information Science from the University of Pennsylvania in 2001, and an M.Eng. (First class) in Engineering and Computing Science from Oxford University in 1995. At Harvard, Parkes leads the EconCS group and teaches classes in artificial intelligence, optimization, and topics at the intersection between computer science and economics. Parkes has served as Program Chair of ACM EC’07 and AAMAS’08 and General Chair of ACM EC’10, served on the editorial board of Journal of Artificial Intelligence Research, and currently serves as Editor of Games and Economic Behavior and on the boards of Journal of Autonomous Agents and Multi-agent Systems and INFORMS Journal of Computing. His research interests include computational mechanism design, electronic commerce, stochastic optimization, preference elicitation, market design, bounded rationality, computational social choice, networks and incentives, multi-agent systems, crowd-sourcing and social computing.

More from the Same Authors