Timezone: »
The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and acting in a shared multi-agent environment? Observing that humans often provide incentives to influence others' behavior, we propose to equip each RL agent in a multi-agent environment with the ability to give rewards directly to other agents, using a learned incentive function. Each agent learns its own incentive function by explicitly accounting for its impact on the learning of recipients and, through them, the impact on its own extrinsic objective. We demonstrate in experiments that such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games, often by finding a near-optimal division of labor. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
Author Information
Jiachen Yang (Georgia Institute of Technology)
Ang Li (DeepMind, Mountain View)
Mehrdad Farajtabar (DeepMind)
Peter Sunehag (Google - DeepMind)
Edward Hughes (DeepMind)
Hongyuan Zha (Georgia Tech)
More from the Same Authors
-
2021 Spotlight: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett -
2021 : One Pass ImageNet »
Clara Huiyi Hu · Ang Li · Daniele Calandriello · Dilan Gorur -
2022 Poster: A Unified Framework for Deep Symbolic Regression »
Mikel Landajuela · Chak Shing Lee · Jiachen Yang · Ruben Glatt · Claudio P Santiago · Ignacio Aravena · Terrell Mundhenk · Garrett Mulcahy · Brenden K Petersen -
2021 Workshop: Cooperative AI »
Natasha Jaques · Edward Hughes · Jakob Foerster · Noam Brown · Kalesha Bullard · Charlotte Smith -
2021 : Welcome and Opening Remarks »
Edward Hughes · Natasha Jaques -
2021 Poster: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett -
2021 Poster: Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators »
Qitian Wu · Rui Gao · Hongyuan Zha -
2021 Poster: Random Noise Defense Against Query-Based Black-Box Attacks »
Zeyu Qin · Yanbo Fan · Hongyuan Zha · Baoyuan Wu -
2020 Poster: Network Diffusions via Neural Mean-Field Dynamics »
Shushan He · Hongyuan Zha · Xiaojing Ye -
2020 Poster: Understanding the Role of Training Regimes in Continual Learning »
Seyed Iman Mirzadeh · Mehrdad Farajtabar · Razvan Pascanu · Hassan Ghasemzadeh -
2020 Poster: Self-Distillation Amplifies Regularization in Hilbert Space »
Hossein Mobahi · Mehrdad Farajtabar · Peter Bartlett -
2020 Poster: Differentiable Top-k with Optimal Transport »
Yujia Xie · Hanjun Dai · Minshuo Chen · Bo Dai · Tuo Zhao · Hongyuan Zha · Wei Wei · Tomas Pfister -
2020 Poster: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs »
Nevena Lazic · Dong Yin · Mehrdad Farajtabar · Nir Levine · Dilan Gorur · Chris Harris · Dale Schuurmans -
2020 Poster: Learning Strategic Network Emergence Games »
Rakshit Trivedi · Hongyuan Zha -
2019 Workshop: Learning with Temporal Point Processes »
Manuel Rodriguez · Le Song · Isabel Valera · Yan Liu · Abir De · Hongyuan Zha -
2019 Poster: Meta Learning with Relational Information for Short Sequences »
Yujia Xie · Haoming Jiang · Feng Liu · Tuo Zhao · Hongyuan Zha -
2018 Poster: Inequity aversion improves cooperation in intertemporal social dilemmas »
Edward Hughes · Joel Leibo · Matthew Phillips · Karl Tuyls · Edgar Dueñez-Guzman · Antonio García Castañeda · Iain Dunning · Tina Zhu · Kevin McKee · Raphael Koster · Heather Roff · Thore Graepel -
2017 Poster: A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering »
Hongteng Xu · Hongyuan Zha -
2017 Poster: Predicting User Activity Level In Point Processes With Mass Transport Equation »
Yichen Wang · Xiaojing Ye · Hongyuan Zha · Le Song -
2017 Poster: Wasserstein Learning of Deep Generative Point Process Models »
Shuai Xiao · Mehrdad Farajtabar · Xiaojing Ye · Junchi Yan · Xiaokang Yang · Le Song · Hongyuan Zha -
2016 Poster: Multistage Campaigning in Social Networks »
Mehrdad Farajtabar · Xiaojing Ye · Sahar Harati · Le Song · Hongyuan Zha -
2015 Poster: COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution »
Mehrdad Farajtabar · Yichen Wang · Manuel Rodriguez · Shuang Li · Hongyuan Zha · Le Song -
2015 Oral: COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution »
Mehrdad Farajtabar · Yichen Wang · Manuel Rodriguez · Shuang Li · Hongyuan Zha · Le Song -
2014 Poster: Shaping Social Activity by Incentivizing Users »
Mehrdad Farajtabar · Nan Du · Manuel Gomez Rodriguez · Isabel Valera · Hongyuan Zha · Le Song -
2013 Poster: Scalable Influence Estimation in Continuous-Time Diffusion Networks »
Nan Du · Le Song · Manuel Gomez Rodriguez · Hongyuan Zha -
2013 Oral: Scalable Influence Estimation in Continuous-Time Diffusion Networks »
Nan Du · Le Song · Manuel Gomez Rodriguez · Hongyuan Zha -
2009 Poster: Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora »
Shuang Yang · Hongyuan Zha · Bao-Gang Hu -
2008 Poster: Convergence and Rate of Convergence of A Manifold-Based Dimension Reduction »
Andrew Smith · Xiaoming Huo · Hongyuan Zha -
2007 Poster: A General Boosting Method and its Application to Learning Ranking Functions for Web Search »
Zhaohui Zheng · Hongyuan Zha · Tong Zhang · Olivier Chapelle · Keke Chen · Gordon Sun