Timezone: »
An AI agent should be able to coordinate with humans to solve tasks. We consider the problem of training a Reinforcement Learning (RL) agent without using any human data, i.e., in a zero-shot setting, to make it capable of collaborating with humans. Standard RL agents learn through self-play. Unfortunately, these agents only know how to collaborate with themselves and normally do not perform well with unseen partners, such as humans. The methodology of how to train a robust agent in a zero-shot fashion is still subject to research. Motivated from the maximum entropy RL, we derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners. The proposed method shows its effectiveness compared to baseline methods, including self-play PPO, the standard Population-Based Training (PBT), and trajectory diversity-based PBT, in the popular Overcooked game environment. We also conduct online experiments with real humans and further demonstrate the efficacy of the method in the real world. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.
Author Information
Rui Zhao (Tencent)
Jinming Song
Hu Haifeng (Tencent AI Platform)
Yang Gao (Tsinghua University)
Yi Wu (OpenAI)
Zhongqian Sun
Wei Yang (Tencent AI Lab)
More from the Same Authors
-
2021 : Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension »
Shusheng Xu · Yichen Liu · Xiaoyu Yi · Siyuan Zhou · Huizi Li · Yi Wu -
2021 : Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization »
Zihan Zhou · Wei Fu · Bingliang Zhang · Yi Wu -
2021 : Learning Efficient Multi-Agent Cooperative Visual Exploration »
Chao Yu · Jiaxuan Gao · Huazhong Yang · Yu Wang · Yi Wu -
2022 Poster: Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhengrong Xue · Bo Yuan · Xueqian Wang · YI WU · Yang Gao · Huazhe Xu -
2022 Poster: SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification »
Xiyue Wang · Jinxi Xiang · Jun Zhang · Sen Yang · Zhongyi Yang · Ming-Hui Wang · Jing Zhang · Wei Yang · Junzhou Huang · Xiao Han -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhengrong Xue · Bo Yuan · Xueqian Wang · YI WU · Yang Gao · Huazhe Xu -
2022 Poster: Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning »
Hua Wei · Jingxiao Chen · Xiyang Ji · Hongyang Qin · Minwen Deng · Siqin Li · Liang Wang · Weinan Zhang · Yong Yu · Liu Linc · Lanxiao Huang · Deheng Ye · Qiang Fu · Wei Yang -
2022 Poster: Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions »
Weirui Ye · Pieter Abbeel · Yang Gao -
2022 Poster: Planning for Sample Efficient Imitation Learning »
Zhao-Heng Yin · Weirui Ye · Qifeng Chen · Yang Gao -
2022 Poster: An Empirical Study on Disentanglement of Negative-free Contrastive Learning »
Jinkun Cao · Ruiqian Nai · Qing Yang · Jialei Huang · Yang Gao -
2021 Poster: Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems »
Jiayu Chen · Yuanxin Zhang · Yuanfan Xu · Huimin Ma · Huazhong Yang · Jiaming Song · Yu Wang · Yi Wu -
2021 Poster: Mastering Atari Games with Limited Data »
Weirui Ye · Shaohuai Liu · Thanard Kurutach · Pieter Abbeel · Yang Gao -
2021 Poster: Learning Diverse Policies in MOBA Games via Macro-Goals »
Yiming Gao · Bei Shi · Xueying Du · Liang Wang · Guangwei Chen · Zhenjie Lian · Fuhao Qiu · GUOAN HAN · Weixuan Wang · Deheng Ye · Qiang Fu · Wei Yang · Lanxiao Huang -
2021 Poster: NovelD: A Simple yet Effective Exploration Criterion »
Tianjun Zhang · Huazhe Xu · Xiaolong Wang · Yi Wu · Kurt Keutzer · Joseph Gonzalez · Yuandong Tian -
2021 Poster: Reinforcement Learning with Latent Flow »
Wenling Shang · Xiaofei Wang · Aravind Srinivas · Aravind Rajeswaran · Yang Gao · Pieter Abbeel · Misha Laskin -
2020 Poster: Towards Playing Full MOBA Games with Deep Reinforcement Learning »
Deheng Ye · Guibin Chen · Wen Zhang · Sheng Chen · Bo Yuan · Bo Liu · Jia Chen · Zhao Liu · Fuhao Qiu · Hongsheng Yu · Yinyuting Yin · Bei Shi · Liang Wang · Tengfei Shi · Qiang Fu · Wei Yang · Lanxiao Huang · Wei Liu -
2020 Poster: Fighting Copycat Agents in Behavioral Cloning from Observation Histories »
Chuan Wen · Jierui Lin · Trevor Darrell · Dinesh Jayaraman · Yang Gao -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang