Timezone: »
We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make \emph{no} assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the \emph{unknown} hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be minimax optimal simultaneously over all hardness levels; and (2) our algorithm achieves a rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over \emph{all} hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.
Author Information
Yinglun Zhu (University of Wisconsin-Madison)
Robert Nowak (University of Wisconsion-Madison)
More from the Same Authors
-
2022 : A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets »
Liu Yang · Jifan Zhang · Joseph Shenouda · Dimitris Papailiopoulos · Kangwook Lee · Robert Nowak -
2023 Poster: Algorithm Selection for Deep Active Learning with Imbalanced Datasets »
Jifan Zhang · Shuai Shao · Saurabh Verma · Robert Nowak -
2023 Poster: Multi-task Representation Learning for Pure Exploration in Bilinear Bandits »
Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak -
2022 : Panel »
Mayee Chen · Alexander Ratner · Robert Nowak · Cody Coleman · Ramya Korlakai Vinayak -
2022 Poster: Efficient Active Learning with Abstention »
Yinglun Zhu · Robert Nowak -
2022 Poster: Active Learning with Neural Networks: Insights from Nonparametric Statistics »
Yinglun Zhu · Robert Nowak -
2022 Poster: One for All: Simultaneous Metric and Preference Learning over Multiple Users »
Gregory Canal · Blake Mason · Ramya Korlakai Vinayak · Robert Nowak -
2021 Poster: Pure Exploration in Kernel and Neural Bandits »
Yinglun Zhu · Dongruo Zhou · Ruoxi Jiang · Quanquan Gu · Rebecca Willett · Robert Nowak -
2020 : Dataset Curation via Active Learning »
Robert Nowak -
2020 Poster: Finding All $\epsilon$-Good Arms in Stochastic Bandits »
Blake Mason · Lalit Jain · Ardhendu Tripathy · Robert Nowak -
2019 Poster: Learning Nearest Neighbor Graphs from Noisy Distance Samples »
Blake Mason · Ardhendu Tripathy · Robert Nowak -
2019 Poster: MaxGap Bandit: Adaptive Algorithms for Approximate Ranking »
Sumeet Katariya · Ardhendu Tripathy · Robert Nowak -
2017 Poster: Scalable Generalized Linear Bandits: Online Computation and Hashing »
Kwang-Sung Jun · Aniruddha Bhargava · Robert Nowak · Rebecca Willett -
2017 Poster: A KL-LUCB algorithm for Large-Scale Crowdsourcing »
Ervin Tanczos · Robert Nowak · Bob Mankoff -
2017 Poster: Learning Low-Dimensional Metrics »
Blake Mason · Lalit Jain · Robert Nowak