Timezone: »
We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order (1 + \epsilon) are uniformly bounded by a known constant B, for some given \epsilon > 0. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.
Author Information
Shubhada Agrawal (TIFR Mumbai)
Sandeep Juneja (Tata Institute of Fundamental Research)
Wouter Koolen (Centrum Wiskunde & Informatica, Amsterdam)
More from the Same Authors
-
2022 Poster: Luckiness in Multiscale Online Learning »
Wouter Koolen · Muriel F. Pérez-Ortiz -
2021 : Contributed talk #1 – Regret minimization in heavy-tailed bandits, Shubhada Agrawal »
Shubhada Agrawal -
2021 Poster: A/B/n Testing with Control in the Presence of Subpopulations »
Yoan Russac · Christina Katsimerou · Dennis Bohle · Olivier Cappé · Aurélien Garivier · Wouter Koolen -
2021 Poster: Optimal Best-Arm Identification Methods for Tail-Risk Measures »
Shubhada Agrawal · Wouter Koolen · Sandeep Juneja -
2019 Poster: Pure Exploration with Multiple Correct Answers »
Rémy Degenne · Wouter Koolen -
2019 Poster: Non-Asymptotic Pure Exploration by Solving Games »
Rémy Degenne · Wouter Koolen · Pierre Ménard -
2018 Poster: Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling »
Emilie Kaufmann · Wouter Koolen · Aurélien Garivier -
2017 Poster: Random Permutation Online Isotonic Regression »
Wojciech Kotlowski · Wouter Koolen · Alan Malek -
2017 Poster: Monte-Carlo Tree Search by Best Arm Identification »
Emilie Kaufmann · Wouter Koolen -
2017 Spotlight: Monte-Carlo Tree Search by Best Arm Identification »
Emilie Kaufmann · Wouter Koolen -
2016 Poster: Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning »
Wouter Koolen · Peter Grünwald · Tim van Erven -
2016 Poster: MetaGrad: Multiple Learning Rates in Online Learning »
Tim van Erven · Wouter Koolen -
2016 Oral: MetaGrad: Multiple Learning Rates in Online Learning »
Tim van Erven · Wouter Koolen -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 Workshop: Learning Faster from Easy Data II »
Tim van Erven · Wouter Koolen -
2015 Poster: Minimax Time Series Prediction »
Wouter Koolen · Alan Malek · Peter Bartlett · Yasin Abbasi Yadkori -
2014 Poster: Efficient Minimax Strategies for Square Loss Games »
Wouter M Koolen · Alan Malek · Peter Bartlett -
2014 Poster: Learning the Learning Rate for Prediction with Expert Advice »
Wouter M Koolen · Tim van Erven · Peter Grünwald -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: The Pareto Regret Frontier »
Wouter M Koolen -
2012 Poster: Putting Bayes to sleep »
Wouter M Koolen · Dmitri Adamskiy · Manfred K. Warmuth -
2012 Spotlight: Putting Bayes to sleep »
Wouter M Koolen · Dmitri Adamskiy · Manfred K. Warmuth -
2011 Poster: Adaptive Hedge »
Tim van Erven · Peter Grünwald · Wouter M Koolen · Steven D Rooij -
2011 Poster: Learning Eigenvectors for Free »
Wouter M Koolen · Wojciech Kotlowski · Manfred K. Warmuth