Timezone: »

 
Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination
Rui Zhao · Jinming Song · Hu Haifeng · Yang Gao · Yi Wu · Zhongqian Sun · Wei Yang
Event URL: https://openreview.net/forum?id=0DybzWMUgjj »

An AI agent should be able to coordinate with humans to solve tasks. We consider the problem of training a Reinforcement Learning (RL) agent without using any human data, i.e., in a zero-shot setting, to make it capable of collaborating with humans. Standard RL agents learn through self-play. Unfortunately, these agents only know how to collaborate with themselves and normally do not perform well with unseen partners, such as humans. The methodology of how to train a robust agent in a zero-shot fashion is still subject to research. Motivated from the maximum entropy RL, we derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners. The proposed method shows its effectiveness compared to baseline methods, including self-play PPO, the standard Population-Based Training (PBT), and trajectory diversity-based PBT, in the popular Overcooked game environment. We also conduct online experiments with real humans and further demonstrate the efficacy of the method in the real world. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.

Author Information

Rui Zhao (Tencent)
Jinming Song
Hu Haifeng (Tencent AI Platform)
Yang Gao (Tsinghua University)
Yi Wu (OpenAI)
Zhongqian Sun
Wei Yang (Tencent AI Lab)

More from the Same Authors