Timezone: »
In zero-sum games, an NE strategy tends to be overly conservative confronted with opponents of limited rationality, because it does not actively exploit their weaknesses. From another perspective, best responding to an estimated opponent model is vulnerable to estimation errors and lacks safety guarantees. Inspired by the recent success of real-time search algorithms in developing superhuman AI, we investigate the dilemma of safety and opponent exploitation and present a novel real-time search framework, called Safe Exploitation Search (SES), which continuously interpolates between the two extremes of online strategy refinement. We provide SES with a theoretically upper-bounded exploitability and a lower-bounded evaluation performance. Additionally, SES enables computationally efficient online adaptation to a possibly updating opponent model, while previous safe exploitation methods have to recompute for the whole game. Empirical results show that SES significantly outperforms NE baselines and previous algorithms while keeping exploitability low at the same time.
Author Information
Mingyang Liu (Tsinghua University, Tsinghua University)
Chengjie Wu (Tsinghua University)
Qihan Liu (Tsinghua University)
Yansen Jing (Tsinghua University, Tsinghua University)
Jun Yang (Tsinghua University, Tsinghua University)
Pingzhong Tang (Tsinghua University)
Chongjie Zhang (Tsinghua University)
More from the Same Authors
-
2021 Spotlight: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2022 Poster: LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning »
Xi Chen · Ali Ghadirzadeh · Tianhe Yu · Jianhao Wang · Alex Yuan Gao · Wenzhe Li · Liang Bin · Chelsea Finn · Chongjie Zhang -
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2023 Poster: Conservative Offline Policy Adaptation in Multi-Agent Games »
Chengjie Wu · Pingzhong Tang · Jun Yang · Yujing Hu · Tangjie Lv · Changjie Fan · Chongjie Zhang -
2022 Poster: Low-Rank Modular Reinforcement Learning via Muscle Synergy »
Heng Dong · Tonghan Wang · Jiayuan Liu · Chongjie Zhang -
2022 Poster: Non-Linear Coordination Graphs »
Yipeng Kang · Tonghan Wang · Qianlan Yang · Xiaoran Wu · Chongjie Zhang -
2022 Poster: CUP: Critic-Guided Policy Reuse »
Jin Zhang · Siyuan Li · Chongjie Zhang -
2021 Poster: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 Poster: Celebrating Diversity in Shared Multi-Agent Reinforcement Learning »
Chenghao Li · Tonghan Wang · Chengjie Wu · Qianchuan Zhao · Jun Yang · Chongjie Zhang