Timezone: »
We present Reward-Switching Policy Optimization (RSPO), a paradigm to dis-cover diverse strategies in complex RL environments by iteratively finding novelpolicies that are both locally optimal and sufficiently different from existing ones.To encourage the learning policy to consistently converge towards a previouslyundiscovered local optimum, RSPO switches between extrinsic and intrinsic re-wards via a trajectory-based novelty measurement during the optimization process.When a sampled trajectory is sufficiently distinct, RSPO performs standard policyoptimization with extrinsic rewards. For trajectories with high likelihood underexisting policies, RSPO utilizes an intrinsic diversity reward to promote exploration.Experiments show that RSPO is able to discover a wide spectrum of strategies in avariety of domains, ranging from single-agent particle-world tasks and MuJoCocontinuous control to multi-agent stag-hunt games and StarCraftII challenges.
Author Information
Zihan Zhou (Shanghai Qi Zhi Institute)
Wei Fu (Institute for Interdisciplinary Information Sciences, Tsinghua University, Tsinghua University)
Bingliang Zhang (Tsinghua University, Tsinghua University)
Yi Wu (OpenAI)
More from the Same Authors
-
2021 : Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension »
Shusheng Xu · Yichen Liu · Xiaoyu Yi · Siyuan Zhou · Huizi Li · Yi Wu -
2021 : Learning Efficient Multi-Agent Cooperative Visual Exploration »
Chao Yu · Jiaxuan Gao · Huazhong Yang · Yu Wang · Yi Wu -
2021 : Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination »
Rui Zhao · Jinming Song · Hu Haifeng · Yang Gao · Yi Wu · Zhongqian Sun · Wei Yang -
2021 Poster: Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems »
Jiayu Chen · Yuanxin Zhang · Yuanfan Xu · Huimin Ma · Huazhong Yang · Jiaming Song · Yu Wang · Yi Wu -
2021 Poster: NovelD: A Simple yet Effective Exploration Criterion »
Tianjun Zhang · Huazhe Xu · Xiaolong Wang · Yi Wu · Kurt Keutzer · Joseph Gonzalez · Yuandong Tian