Timezone: »
We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.
Author Information
Jakob Foerster (University of Oxford)
Jakob Foerster is a PhD student in AI at the University of Oxford under the supervision of Shimon Whiteson and Nando de Freitas. Using deep reinforcement learning he studies the emergence of communication in multi-agent AI systems. Prior to his PhD Jakob spent four years working at Google and Goldman Sachs. Previously he has also worked on a number of research projects in systems neuroscience, including work at MIT and the Weizmann Institute.
Yannis Assael (University of Oxford)
Nando de Freitas (University of Oxford)
Shimon Whiteson (University of Oxford)
More from the Same Authors
-
2020 Workshop: Talking to Strangers: Zero-Shot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning »
Tabish Rashid · Gregory Farquhar · Bei Peng · Shimon Whiteson -
2020 Poster: Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? »
Vitaly Kurin · Saad Godil · Shimon Whiteson · Bryan Catanzaro -
2020 Poster: Learning Retrospective Knowledge with Reverse Reinforcement Learning »
Shangtong Zhang · Vivek Veeriah · Shimon Whiteson -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2019 Poster: MAVEN: Multi-Agent Variational Exploration »
Anuj Mahajan · Tabish Rashid · Mikayel Samvelyan · Shimon Whiteson -
2019 Poster: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning »
Gregory Farquhar · Shimon Whiteson · Jakob Foerster -
2019 Poster: Multi-Agent Common Knowledge Reinforcement Learning »
Christian Schroeder de Witt · Jakob Foerster · Gregory Farquhar · Philip Torr · Wendelin Boehmer · Shimon Whiteson -
2019 Poster: DAC: The Double Actor-Critic Architecture for Learning Options »
Shangtong Zhang · Shimon Whiteson -
2019 Poster: Fast Efficient Hyperparameter Tuning for Policy Gradient Methods »
Supratik Paul · Vitaly Kurin · Shimon Whiteson -
2019 Poster: VIREL: A Variational Inference Framework for Reinforcement Learning »
Matthew Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson -
2019 Spotlight: VIREL: A Variational Inference Framework for Reinforcement Learning »
Matthew Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson -
2019 Poster: Generalized Off-Policy Actor-Critic »
Shangtong Zhang · Wendelin Boehmer · Shimon Whiteson -
2018 Workshop: Emergent Communication Workshop »
Jakob Foerster · Angeliki Lazaridou · Ryan Lowe · Igor Mordatch · Douwe Kiela · Kyunghyun Cho -
2017 Workshop: Emergent Communication Workshop »
Jakob Foerster · Igor Mordatch · Angeliki Lazaridou · Kyunghyun Cho · Douwe Kiela · Pieter Abbeel -
2017 Poster: Dynamic-Depth Context Tree Weighting »
Joao V Messias · Shimon Whiteson -
2017 Poster: Cortical microcircuits as gated-recurrent neural networks »
Rui Costa · Yannis Assael · Brendan Shillingford · Nando de Freitas · TIm Vogels -
2015 Poster: Copeland Dueling Bandits »
Masrour Zoghi · Zohar Karnin · Shimon Whiteson · Maarten de Rijke -
2014 Poster: Distributed Parameter Estimation in Probabilistic Graphical Models »
Yariv D Mizrahi · Misha Denil · Nando de Freitas -
2013 Workshop: Bayesian Optimization in Theory and Practice »
Matthew Hoffman · Jasper Snoek · Nando de Freitas · Michael A Osborne · Ryan Adams · Sebastien Bubeck · Philipp Hennig · Remi Munos · Andreas Krause -
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih -
2011 Workshop: Bayesian optimization, experimental design and bandits: Theory and applications »
Nando de Freitas · Roman Garnett · Frank R Hutter · Michael A Osborne -
2010 Session: Spotlights Session 10 »
Nando de Freitas -
2010 Session: Oral Session 12 »
Nando de Freitas -
2009 Workshop: Adaptive Sensing, Active Learning, and Experimental Design »
Rui M Castro · Nando de Freitas · Ruben Martinez-Cantin -
2009 Tutorial: Sequential Monte-Carlo Methods »
Arnaud Doucet · Nando de Freitas -
2008 Poster: An interior-point stochastic approximation method and an L1-regularized delta rule »
Peter Carbonetto · Mark Schmidt · Nando de Freitas -
2008 Oral: An interior-point stochastic approximation method and an L1-regularized delta rule »
Peter Carbonetto · Mark Schmidt · Nando de Freitas -
2008 Demonstration: Worio: A Web-Scale Machine Learning System »
Nando de Freitas · Ali Davar -
2007 Spotlight: Bayesian Policy Learning with Trans-Dimensional MCMC »
Matthew Hoffman · Arnaud Doucet · Nando de Freitas · Ajay Jasra -
2007 Poster: Bayesian Policy Learning with Trans-Dimensional MCMC »
Matthew Hoffman · Arnaud Doucet · Nando de Freitas · Ajay Jasra -
2007 Poster: Active Preference Learning with Discrete Choice Data »
Eric Brochu · Nando de Freitas · Abhijeet Ghosh -
2006 Poster: Conditional mean field »
Peter Carbonetto · Nando de Freitas