Timezone: »

Fighting Bandits with a New Kind of Smoothness
Jacob D Abernethy · Chansoo Lee · Ambuj Tewari

Thu Dec 10 08:00 AM -- 12:00 PM (PST) @ 210 C #95
We focus on the adversarial multi-armed bandit problem. The EXP3 algorithm of Auer et al. (2003) was shown to have a regret bound of $O(\sqrt{T N \log N})$, where $T$ is the time horizon and $N$ is the number of available actions (arms). More recently, Audibert and Bubeck (2009) improved the bound by a logarithmic factor via an entirely different method. In the present work, we provide a new set of analysis tools, using the notion of convex smoothing, to provide several novel algorithms with optimal guarantees. First we show that regularization via the Tsallis entropy matches the minimax rate of Audibert and Bubeck (2009) with an even tighter constant; it also fully generalizes EXP3. Second we show that a wide class of perturbation methods lead to near-optimal bandit algorithms as long as a simple condition on the perturbation distribution $\mathcal{D}$ is met: one needs that the hazard function of $\mathcal{D}$ remain bounded. The Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property; interestingly, the Gaussian and Uniform distributions do not.

Author Information

Jacob D Abernethy (University of Michigan)
Chansoo Lee (University of Michigan)
Ambuj Tewari (University of Michigan)

More from the Same Authors