NeurIPS Poster Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms

Poster

Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms

Jiechao Guan · Hui Xiong

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Abstract: Hierarchical Bayesian bandit refers to the multi-task bandit problem in which bandit tasks are assumed to be drawn from the same distribution. In this work, we provide improved Bayes regret bounds for hierarchical Bayesian bandit algorithms in the multi-task linear bandit and semi-bandit settings. For the multi-task linear bandit, we first analyze the preexisting hierarchical Thompson sampling (HierTS) algorithm, and improve its gap-independent Bayes regret bound from

O (m \sqrt{n \log n \log (m n)})

$O(m\sqrt{n\log{n}\log{(mn)}})$ to

O (m \sqrt{n \log n})

$O(m\sqrt{n\log{n}})$ in the case of infinite action set, with

m

$m$ being the number of tasks and

n

$n$ the number of iterations per task. In the case of finite action set, we propose a novel hierarchical Bayesian bandit algorithm, named hierarchical BayesUCB (HierBayesUCB), that achieves the logarithmic but gap-dependent regret bound

O (m \log (m n) \log n)

$O(m\log{(mn)}\log{n})$ under mild assumptions. All of the above regret bounds hold in many variants of hierarchical Bayesian linear bandit problem, including when the tasks are solved sequentially or concurrently. Furthermore, we extend the aforementioned HierTS and HierBayesUCB algorithms to the multi-task combinatorial semi-bandit setting. Concretely, our combinatorial HierTS algorithm attains comparable Bayes regret bound

O (m \sqrt{n} \log n)

$O(m\sqrt{n}\log{n})$ with respect to the latest one. Moreover, our combinatorial HierBayesUCB yields a sharper Bayes regret bound

O (m \log (m n) \log n)

$O(m\log{(mn)}\log{n})$ . Experiments are conducted to validate the soundness of our theoretical results for multi-task bandit algorithms.

Chat is not available.