Timezone: »

 
Regret Minimization in Heavy-Tailed Bandits
Shubhada Agrawal · Sandeep Juneja · Wouter Koolen
Event URL: https://eventhosts.gather.town/app/kR7ip0Bhhn8BXuMD/wiml-workshop-2021 »

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order (1 + \epsilon) are uniformly bounded by a known constant B, for some given  \epsilon > 0. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

Author Information

Shubhada Agrawal (TIFR Mumbai)
Sandeep Juneja (Tata Institute of Fundamental Research)
Wouter Koolen (Centrum Wiskunde & Informatica, Amsterdam)

More from the Same Authors