Timezone: »
Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of 10^50. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots. Experiment code is available at https://github.com/JBLanier/pipeline-psro.
Author Information
Stephen McAleer (UC Irvine)
JB Lanier (University of California Irvine)
Roy Fox (UC Irvine)
Pierre Baldi (UC Irvine)
More from the Same Authors
-
2021 : Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates »
Litian Liang · Yaosheng Xu · Stephen McAleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2021 : Target Entropy Annealing for Discrete Soft Actor-Critic »
Yaosheng Xu · Dailin Hu · Litian Liang · Stephen McAleer · Pieter Abbeel · Roy Fox -
2021 : Deep learning reconstruction of the neutrino energy with a shallow Askaryan detector »
Stephen McAleer · Christian Glaser · Pierre Baldi -
2021 : G-SpaNet: Generalized Permutationless Set Assignment for Particle Physics using Symmetry Preserving Attention »
Alexander Shmakov · Shih-chieh Hsu · Pierre Baldi -
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : Geometry-aware Autoregressive Models for Calorimeter Shower Simulations »
Junze Liu · Aishik Ghosh · Dylan Smith · Pierre Baldi · Daniel Whiteson -
2022 : Foundations of Attention Mechanisms in Deep Neural Network Architectures »
Pierre Baldi · Roman Vershynin -
2022 : Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments »
JB Lanier · Stephen McAleer · Pierre Baldi · Roy Fox -
2022 : ESCHER: ESCHEWING IMPORTANCE SAMPLING IN GAMES BY COMPUTING A HISTORY VALUE FUNCTION TO ESTIMATE REGRET »
Stephen McAleer · Gabriele Farina · Marc Lanctot · Tuomas Sandholm -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : Foundations of Attention Mechanisms in Deep Neural Network Architectures »
Pierre Baldi · Roman Vershynin -
2021 Poster: XDO: A Double Oracle Algorithm for Extensive-Form Games »
Stephen McAleer · JB Lanier · Kevin A Wang · Pierre Baldi · Roy Fox -
2019 : Coffee Break & Poster Session »
Samia Mohinta · Andrea Agostinelli · Alexandra Moringen · Jee Hang Lee · Yat Long Lo · Wolfgang Maass · Blue Sheffer · Colin Bredenberg · Benjamin Eysenbach · Liyu Xia · Efstratios Markou · Jan Lichtenberg · Pierre Richemond · Tony Zhang · JB Lanier · Baihan Lin · William Fedus · Glen Berseth · Marta Sarrico · Matthew Crosby · Stephen McAleer · Sina Ghiassian · Franz Scherr · Guillaume Bellec · Darjan Salaj · Arinbjörn Kolbeinsson · Matthew Rosenberg · Jaehoon Shin · Sang Wan Lee · Guillermo Cecchi · Irina Rish · Elias Hajek -
2019 Poster: Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes »
Lingge Li · Dustin Pluta · Babak Shahbaba · Norbert Fortin · Hernando Ombao · Pierre Baldi -
2018 Poster: On Neuronal Capacity »
Pierre Baldi · Roman Vershynin -
2018 Oral: On Neuronal Capacity »
Pierre Baldi · Roman Vershynin -
2017 : Poster session »
Abbas Zaidi · Christoph Kurz · David Heckerman · YiJyun Lin · Stefan Riezler · Ilya Shpitser · Songbai Yan · Olivier Goudet · Yash Deshpande · Judea Pearl · Jovana Mitrovic · Brian Vegetabile · Tae Hwy Lee · Karen Sachs · Karthika Mohan · Reagan Rose · Julius Ramakers · Negar Hassanpour · Pierre Baldi · Razieh Nabi · Noah Hammarlund · Eli Sherman · Carolin Lawrence · Fattaneh Jabbari · Vira Semenova · Maria Dimakopoulou · Pratik Gajane · Russell Greiner · Ilias Zadik · Alexander Blocker · Hao Xu · Tal EL HAY · Tony Jebara · Benoit Rostykus -
2014 Workshop: High-energy particle physics, machine learning, and the HiggsML data challenge (HEPML) »
Glen Cowan · Balázs Kégl · Kyle Cranmer · Gábor Melis · Tim Salimans · Vladimir Vava Gligorov · Daniel Whiteson · Lester Mackey · Wojciech Kotlowski · Roberto Díaz Morales · Pierre Baldi · Cecile Germain · David Rousseau · Isabelle Guyon · Tianqi Chen -
2014 Poster: Searching for Higgs Boson Decay Modes with Deep Learning »
Peter Sadowski · Daniel Whiteson · Pierre Baldi -
2014 Spotlight: Searching for Higgs Boson Decay Modes with Deep Learning »
Peter Sadowski · Daniel Whiteson · Pierre Baldi -
2013 Poster: Understanding Dropout »
Pierre Baldi · Peter Sadowski -
2013 Oral: Understanding Dropout »
Pierre Baldi · Peter Sadowski -
2012 Poster: Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction »
Pietro Di Lena · Pierre Baldi · Ken Nagata -
2012 Spotlight: Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction »
Pietro Di Lena · Pierre Baldi · Ken Nagata -
2011 Poster: A Machine Learning Approach to Predict Chemical Reactions »
Matthew A Kayala · Pierre Baldi -
2010 Workshop: Charting Chemical Space: Challenges and Opportunities for AI and Machine Learning »
Pierre Baldi · Klaus-Robert Müller · Gisbert Schneider -
2007 Poster: Mining Internet-Scale Software Repositories »
Erik Linstead · Paul Rigor, Ph.D. · sushil bajracharya · cristina lopes · Pierre Baldi -
2006 Poster: A Scalable Machine Learning Approach to Go »
Lin Wu · Pierre Baldi -
2006 Talk: A Scalable Machine Learning Approach to Go »
Lin Wu · Pierre Baldi