Timezone: »

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents
YAN ZHENG · Zhaopeng Meng · Jianye Hao · Zongzhang Zhang · Tianpei Yang · Changjie Fan

Wed Dec 05 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #147

In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly. This paper studies efficient policy detecting and reusing techniques when playing against non-stationary agents in Markov games. We propose a new deep BPR+ algorithm by extending the recent BPR+ algorithm with a neural network as the value-function approximator. To detect policy accurately, we propose the \textit{rectified belief model} taking advantage of the \textit{opponent model} to infer the other agent's policy from reward signals and its behaviors. Instead of directly storing individual policies as BPR+, we introduce \textit{distilled policy network} that serves as the policy library in BPR+, using policy distillation to achieve efficient online policy learning and reuse. Deep BPR+ inherits all the advantages of BPR+ and empirically shows better performance in terms of detection accuracy, cumulative rewards and speed of convergence compared to existing algorithms in complex Markov games with raw visual inputs.

Author Information

YAN ZHENG (Tianjin University)
Zhaopeng Meng (School of Computer Software, Tianjin University)
Jianye Hao (Tianjin University)
Zongzhang Zhang (Soochow University)
Tianpei Yang (Tianjin University)
Changjie Fan (Netease)

More from the Same Authors