Timezone: »
The human ability to cooperate in a wide range of contexts is a key ingredient in the success of our species. Problems of cooperation—in which agents seek ways to jointly improve their welfare—are ubiquitous and important. They can be found at every scale, from the daily routines of highway driving, communicating in shared language and work collaborations, to the global challenges of climate change, pandemic preparedness and international trade. With AI agents playing an ever greater role in our lives, we must endow them with similar abilities. In particular they must understand the behaviors of others, find common ground by which to communicate with them, make credible commitments, and establish institutions which promote cooperative behavior. By construction, the goal of Cooperative AI is interdisciplinary in nature. Therefore, our workshop will bring together scholars from diverse backgrounds including reinforcement learning (and inverse RL), multiagent systems, humanAI interaction, game theory, mechanism design, social choice, fairness, cognitive science, language learning, and interpretability. This year we will organize the workshop along two axes. First, we will discuss how to incentivize cooperation in AI systems, developing algorithms that can act effectively in generalsum settings, and which encourage others to cooperate. The second focus is on how to implement effective coordination, given that cooperation is already incentivized. For example, we may examine zeroshot coordination, in which AI agents need to coordinate with novel partners at test time. This setting is highly relevant to humanAI coordination, and provides a stepping stone for the community towards full Cooperative AI.
Tue 5:20 a.m.  5:30 a.m.

Welcome and Opening Remarks
SlidesLive Video » 
Edward Hughes · Natasha Jaques 🔗 
Tue 5:30 a.m.  6:00 a.m.

Invited Talk: Bo An (Nanyang Technological University) on Learning to Coordinate in Complex Environments
(Invited Talk)
SlidesLive Video » 
Bo An 🔗 
Tue 6:00 a.m.  6:30 a.m.

Invited Talk: Michael Muthukrishna (London School of Economics) on Cultural Evolution and Human Cooperation
(Invited Talk)
SlidesLive Video » In the modern world, we cooperate with and live side by side with strangers, who often look, act, and speak in ways very different to us. We work together on goals with culturally distant nations that span the globe. I'm recording this talk, but I could have given it to you in person. That's unusual in many respects. It's unusual from a crossspecies perspective  comparing us to our closest primate cousins, a room full of strange chimps is a room full of dead chimps. It's unusual from a historical perspective  even a few hundred years ago, a stranger in our midst was a potential threat. And it's unusual from a geographic perspective  even today some places are safer and more cooperative than others. Cooperation varies in scale, intensity, and domain  some countries cooperate on healthcare, others on defence. Compounding the puzzle, the evolutionary mechanisms that explain cooperation undermine one another and can stabilize noncooperative or even maladaptive behavior. I'll discuss the latest discoveries in the science of cultural evolution and human cooperation and how these might apply to the development of cooperative AI. 
Michael Muthukrishna 🔗 
Tue 6:30 a.m.  7:00 a.m.

Invited Talk: Pablo Castro (Google Brain) on Estimating Policy Functions in Payment Systems using Reinforcement Learning
(Invited Talk)
SlidesLive Video » In this talk I will present some of our findings (in collaboration with the Bank of Canada) on using RL to approximate the policy rules of banks participating in a highvalue payments system. The objective of the agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that in a simplified twoagent setting, agents using reinforcement learning do learn the optimal policy that minimizes the cost of processing their individual payments. We also show that in more complex settings, both agents learn to reduce their liquidity costs. Our results show the applicability of RL to estimate bestresponse functions in realworld strategic games. 
Pablo Samuel Castro 🔗 
Tue 7:00 a.m.  7:15 a.m.

(Live) Q&A with Invited Speaker (Bo An)
(Live Q&A)

🔗 
Tue 7:15 a.m.  7:30 a.m.

(Live) Q&A with Invited Speaker (Michael Muthukrishna)
(Live Q&A)

🔗 
Tue 7:30 a.m.  7:45 a.m.

(Live) Q&A with Invited Speaker (Pablo Castro)
(Live Q&A)

🔗 
Tue 7:45 a.m.  8:15 a.m.

Invited Talk: Ariel Procaccia (Harvard University) on Democracy and the Pursuit of Randomness
(Invited Talk)
SlidesLive Video » Sortition is a storied paradigm of democracy built on the idea of choosing representatives through lotteries instead of elections. In recent years this idea has found renewed popularity in the form of citizens’ assemblies, which bring together randomly selected people from all walks of life to discuss key questions and deliver policy recommendations. A principled approach to sortition, however, must resolve the tension between two competing requirements: that the demographic composition of citizens’ assemblies reflect the general population and that every person be given a fair chance (literally) to participate. I will describe our work on designing, analyzing and implementing randomized participant selection algorithms that balance these two requirements. I will also discuss practical challenges in sortition based on experience with the adoption and deployment of our opensource system, Panelot. 
Ariel Procaccia 🔗 
Tue 8:15 a.m.  8:45 a.m.

Invited Talk: Dorsa Sadigh (Stanford University) on The Role of Conventions in Adaptive HumanAI Interaction
(Invited Talk)
SlidesLive Video » Today I will be talking about the role of conventions in humanAI collaboration. Conventions are norms/equilibria we build through repeated interactions with each other. The idea of conventions has been wellstudied in linguistics. We will start the talk by discussing the notion of linguistic conventions, and how we can build AI agents that can effectively build these conventions. We then extend the idea of linguistic conventions to conventions through actions. We discuss a modular approach to separate partnerspecific conventions and ruledependent representations. We then discuss how this can be done effectively when working with partners whose actions are high dimensional. Finally we extend the notion of conventions to larger scale systems beyond dyadic interactions. Specifically, we discuss what conventions/equilibria emerge in mixedautonomy traffic networks and how that can be leveraged for better dynamic routing of vehicles. 
Dorsa Sadigh 🔗 
Tue 8:45 a.m.  9:15 a.m.

(Live) Invited Talk: Nika Haghtalab (UC Berkeley) on Collaborative Machine Learning: Training and Incentives
((Live) Invited Talk)
Many modern machine learning paradigms require large amounts of data and computation power that is rarely seen in one place or owned by one agent. In recent years, methods such as federated learning have been embraced as an approach for bringing about collaboration across learning agents. In practice, the success of these methods relies upon our ability to pool together the efforts of large numbers of individual learning agents, data set owners, and curators. In this talk, I will discuss how recruiting, serving, and retaining these agents requires us to address agents’ needs, limitations, and responsibilities. In particular, I will discuss two major questions in this field. First, how can we design collaborative learning mechanisms that benefit agents with heterogeneous learning objectives? Second, how can we ensure that the burden of data collection and learning is shared equitably between agents? 
Nika Haghtalab 🔗 
Tue 9:15 a.m.  9:30 a.m.

(Live) Q&A with Invited Speaker (Ariel Procaccia)
(Live Q&A)

🔗 
Tue 9:30 a.m.  9:45 a.m.

(Live) Q&A with Invited Speaker (Dorsa Sadigh)
(Live Q&A)

🔗 
Tue 9:45 a.m.  10:00 a.m.

(Live) Q&A with Invited Speaker (Nika Haghtalab)
(Live Q&A)

🔗 
Tue 10:00 a.m.  11:00 a.m.

Workshop Poster Session 1 (hosted in GatherTown) (Poster Sessions (GatherTown)) link »  🔗 
Tue 11:00 a.m.  12:00 p.m.

Workshop Poster Session 2 (hosted in GatherTown) (Poster Sessions (GatherTown)) link »  🔗 
Tue 12:00 p.m.  1:00 p.m.

(Live) Panel Discussion: Cooperative AI
(Panel Discussion)

Kalesha Bullard · Allan Dafoe · Fei Fang · Chris Amato · Elizabeth M. Adams 🔗 
Tue 1:00 p.m.  1:15 p.m.

Spotlight Talk: Interactive Inverse Reinforcement Learning for Cooperative Games
(Spotlight Talk)
link »
SlidesLive Video » We study the problem of designing AI agents that cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic twoagent Markov Decision Process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act so it can learn the joint reward function as quickly as possible, and so that the joint policy is as close to optimal as possible? In this paper, we analyse how knowledge about the reward function can be gained. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently. 
Thomas Kleine Büning · AnneMarie George · Christos Dimitrakakis 🔗 
Tue 1:15 p.m.  1:30 p.m.

Spotlight Talk: Learning to solve complex tasks by growing knowledge culturally across generations
(Spotlight Talk)
link »
SlidesLive Video » Knowledge built culturally across generations allows humans to learn far more than an individual could glean from their own experience in a lifetime. Cultural knowledge in turn rests on language: language is the richest record of what previous generations believed, valued, and practiced, and how these evolved over time. The power and mechanisms of language as a means of cultural learning, however, are not well understood, and as a result, current AI systems do not leverage language as a means for cultural knowledge transmission. Here, we take a first step towards reverseengineering cultural learning through language. We developed a suite of complex tasks in the form of minimaliststyle video games, which we deployed in an iterated learning paradigm. Human participants were limited to only two attempts (two lives) to beat each game and were allowed to write a message to a future participant who read the message before playing. Knowledge accumulated gradually across generations, allowing later generations to advance further in the games and perform more efficient actions. Multigenerational learning followed a strikingly similar trajectory to individuals learning alone with an unlimited number of lives. These results suggest that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success. The video game paradigm we pioneer here is thus a rich test bed for developing AI systems capable of acquiring and transmitting cultural knowledge. 
Noah Goodman · Josh Tenenbaum · MH Tessler · Jason Madeano 🔗 
Tue 1:30 p.m.  1:45 p.m.

Spotlight Talk: On the Approximation of Cooperative Heterogeneous MultiAgent Reinforcement Learning (MARL) using Mean Field Control (MFC)
(Spotlight Talk)
link »
SlidesLive Video »
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multiagent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{\mathcal{X}\mathcal{U}}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{\mathcal{X}\mathcal{U}}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{\mathcal{X}\mathcal{U}}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $\mathcal{X},\mathcal{U}$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error with a sample complexity of $\mathcal{O}(e_j^{3})$, $j\in\{1,2,3\}$, respectively.

Mridul Agarwal · Vaneet Aggarwal · Washim Mondal · Satish Ukkusuri 🔗 
Tue 1:45 p.m.  2:00 p.m.

Spotlight Talk: Public Information Representation for Adversarial Team Games
(Spotlight Talk)
link »
SlidesLive Video » The study of sequential games in which a team plays against an adversary is receiving an increasing attention in the scientific literature.Their peculiarity resides in the asymmetric information available to the team members during the play which makes the equilibrium computation problem hard even with zerosum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to \textit{Linear Programming} and \emph{column generation} techniques. Such representations prevent from the adoption of standard tools for the generation of abstractions that previously demonstrated to be crucial when solving huge twoplayer zerosum games. Differently from those works, we investigate the problem of designing a suitable game representation over which abstraction algorithms can work. In particular, our algorithms convert a sequential teamgame with adversaries to a classical \textit{twoplayer zerosum} game. In this converted game, the team is transformed into a single coordinator player which only knows information common to the whole team and prescribes to the players an action for any possible private state. Our conversion enables the adoption of highly scalable techniques already available for twoplayer zerosum games, including techniques for generating automated abstractions. Because of the \textsf{NP}hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we design three pruning techniques that dramatically reduce the size of the tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on \textit{Kuhn} and \textit{Leduc Poker} games, obtained by applying stateofart algorithms for two players zerosum games on the converted games. 
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti 🔗 
Tue 2:00 p.m.  2:15 p.m.

Closing Remarks

Gillian Hadfield 🔗 
Author Information
Natasha Jaques (UC Berkeley)
Edward Hughes (DeepMind)
Jakob Foerster (University of Oxford)
Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multiagent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.
Noam Brown (Facebook AI Research)
Kalesha Bullard (DeepMind)
Charlotte Smith (DeepMind)
More from the Same Authors

2021 Spotlight: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett 
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack ParkerHolder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster 
2021 : No DICE: An Investigation of the BiasVariance Tradeoff in MetaGradients »
Risto Vuorio · Jacob Beck · Greg Farquhar · Jakob Foerster · Shimon Whiteson 
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack ParkerHolder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel 
2021 : A FineTuning Approach to Belief State Modeling »
Samuel Sokota · Hengyuan Hu · David Wu · Jakob Foerster · Noam Brown 
2021 : Generalized Belief Learning in MultiAgent Settings »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster 
2021 : V&S  Panel discussion »
Michael Dennis · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques 
2021 : Welcome and Opening Remarks »
Edward Hughes · Natasha Jaques 
2021 Poster: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett 
2021 Poster: ReplayGuided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack ParkerHolder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel 
2021 Poster: Klevel Reasoning for ZeroShot Coordination in Hanabi »
Brandon Cui · Hengyuan Hu · Luis Pineda · Jakob Foerster 
2021 Poster: Neural PseudoLabel Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster 
2020 Workshop: Talking to Strangers: ZeroShot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì 
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack ParkerHolder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster 
2020 Poster: Learning to Incentivize Other Learning Agents »
Jiachen Yang · Ang Li · Mehrdad Farajtabar · Peter Sunehag · Edward Hughes · Hongyuan Zha 
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho 
2019 Poster: Loaded DiCE: Trading off Bias and Variance in AnyOrder Score Function Gradient Estimators for Reinforcement Learning »
Gregory Farquhar · Shimon Whiteson · Jakob Foerster 
2019 Poster: MultiAgent Common Knowledge Reinforcement Learning »
Christian Schroeder de Witt · Jakob Foerster · Gregory Farquhar · Philip Torr · Wendelin Boehmer · Shimon Whiteson 
2018 Workshop: Emergent Communication Workshop »
Jakob Foerster · Angeliki Lazaridou · Ryan Lowe · Igor Mordatch · Douwe Kiela · Kyunghyun Cho 
2018 Poster: Inequity aversion improves cooperation in intertemporal social dilemmas »
Edward Hughes · Joel Leibo · Matthew Phillips · Karl Tuyls · Edgar DueñezGuzman · Antonio García Castañeda · Iain Dunning · Tina Zhu · Kevin McKee · Raphael Koster · Heather Roff · Thore Graepel 
2017 Workshop: Emergent Communication Workshop »
Jakob Foerster · Igor Mordatch · Angeliki Lazaridou · Kyunghyun Cho · Douwe Kiela · Pieter Abbeel 
2017 Demonstration: Libratus: Beating Top Humans in NoLimit Poker »
Noam Brown · Tuomas Sandholm 
2017 Poster: Safe and Nested Subgame Solving for ImperfectInformation Games »
Noam Brown · Tuomas Sandholm 
2017 Oral: Safe and Nested Subgame Solving for ImperfectInformation Games »
Noam Brown · Tuomas Sandholm 
2016 Poster: Learning to Communicate with Deep MultiAgent Reinforcement Learning »
Jakob Foerster · Yannis Assael · Nando de Freitas · Shimon Whiteson