Skip to yearly menu bar Skip to main content


Poster

On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice

Shinji Ito

[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: This paper examines two extensions of multi-armed bandit problems: multi-armed bandits with expert advice and contextual linear bandits. For the former problem, multi-armed bandits with expert advice, the previously known best upper and lower bounds have been $O(\sqrt{KT \log \frac{N}{K} })$ and $\Omega( \sqrt{KT \frac{ \log N }{\log K }} )$, respectively. Here, $K$, $N$, and $T$ represent the numbers of arms, experts, and rounds, respectively. This paper closes the gap between these bounds by presenting a matching lower bound of $\Omega( \sqrt{KT \log \frac{N}{K}} )$. For the latter problem, contextual linear bandits, we provide an algorithm that achieves $O ( \sqrt{d T \log ( K \min\{ 1, \frac{S}{d} \} )} )$ together with a matching lower bound, where $d$ and $S$ represent the dimensionality of feature vectors and the size of the context space, respectively.

Live content is unavailable. Log in and register to view live content