Timezone: »
Regret-based algorithms are highly efficient at finding approximate Nash equilibria in sequential games such as poker games. However, most regret-based algorithms, including counterfactual regret minimization (CFR) and its variants, rely on iterate averaging to achieve convergence. Inspired by recent advances on last-iterate convergence of optimistic algorithms in zero-sum normal-form games, we study this phenomenon in sequential games, and provide a comprehensive study of last-iterate convergence for zero-sum extensive-form games with perfect recall (EFGs), using various optimistic regret-minimization algorithms over treeplexes. This includes algorithms using the vanilla entropy or squared Euclidean norm regularizers, as well as their dilated versions which admit more efficient implementation. In contrast to CFR, we show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast. We also provide experiments to further support our theoretical results.
Author Information
Chung-Wei Lee (University of Southern California)
Christian Kroer (Columbia University)
Haipeng Luo (University of Southern California)
More from the Same Authors
-
2022 : Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games »
Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2022 : A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · J. Zico Kolter · Nicolas Loizou · Marc Lanctot · Ioannis Mitliagkas · Noam Brown · Christian Kroer -
2023 Poster: Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms »
Tiancheng Jin · Junyan Liu · Haipeng Luo -
2023 Poster: Regret Matching$^+$: (In)Stability and Fast Convergence in Games »
Gabriele Farina · Julien Grand-Clément · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2023 Poster: Practical Contextual Bandits with Feedback Graphs »
Mengxiao Zhang · Yuheng Zhang · Olga Vrousgou · Haipeng Luo · Paul Mineiro -
2023 Poster: Context-lumpable stochastic bandits »
Chung-Wei Lee · Qinghua Liu · Yasin Abbasi Yadkori · Chi Jin · Tor Lattimore · Csaba Szepesvari -
2023 Poster: Block-Coordinate Methods and Restarting for Solving Extensive-Form Games »
Darshan Chakrabarti · Jelena Diakonikolas · Christian Kroer -
2023 Poster: Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games »
Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng -
2023 Poster: No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions »
William Chang · Tiancheng Jin · Junyan Liu · Haipeng Luo · Chloé Rouyer · Chen-Yu Wei -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Spotlight: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback »
Yan Dai · Haipeng Luo · Liyu Chen -
2022 Poster: Nonstationary Dual Averaging and Online Fair Allocation »
Luofeng Liao · Yuan Gao · Christian Kroer -
2022 Poster: Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments »
Liyu Chen · Haipeng Luo -
2022 Poster: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games »
Ioannis Anagnostides · Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo · Tuomas Sandholm -
2022 Poster: Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback »
Yan Dai · Haipeng Luo · Liyu Chen -
2022 Poster: Optimal Efficiency-Envy Trade-Off via Optimal Transport »
Steven Yin · Christian Kroer -
2022 Poster: Near-Optimal No-Regret Learning Dynamics for General Convex Games »
Gabriele Farina · Ioannis Anagnostides · Haipeng Luo · Chung-Wei Lee · Christian Kroer · Tuomas Sandholm -
2021 Poster: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2021 Poster: Online Market Equilibrium with Application to Fair Division »
Yuan Gao · Alex Peysakhovich · Christian Kroer -
2021 Poster: Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia-Jahromi · Rahul Jain · Haipeng Luo -
2021 Poster: Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving »
Julien Grand-Clément · Christian Kroer -
2021 Poster: Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2021 Oral: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tiancheng Jin · Longbo Huang · Haipeng Luo -
2020 Poster: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Poster: Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions »
Tom Yan · Christian Kroer · Alexander Peysakhovich -
2020 Spotlight: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Oral: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: First-Order Methods for Large-Scale Market Equilibrium Computation »
Yuan Gao · Christian Kroer -
2020 Poster: Comparator-Adaptive Convex Bandits »
Dirk van der Hoeven · Ashok Cutkosky · Haipeng Luo -
2019 Poster: Equipping Experts/Bandits with Long-term Memory »
Kai Zheng · Haipeng Luo · Ilias Diakonikolas · Liwei Wang -
2019 Poster: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Spotlight: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Poster: Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions »
Gabriele Farina · Christian Kroer · Tuomas Sandholm -
2019 Poster: Robust Multi-agent Counterfactual Prediction »
Alexander Peysakhovich · Christian Kroer · Adam Lerer -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan