Timezone: »

Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search
Aijun Bai · Feng Wu · Xiaoping Chen

Sat Dec 07 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node. Experimental results show that our proposed algorithm has achieved the state-of-the-art comparing with popular UCT algorithm in the context of online planning for general Markov decision processes.

Author Information

Aijun Bai (University of Science and Technology of China (USTC))
Feng Wu (University of Southampton)
Xiaoping Chen (University of Science and Technology of China (USTC))