Timezone: »
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of suprema of empirical processes, we provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms. Unlike previous approaches which sample based on minimizing a worst-case variance (e.g. G-optimal design), we define an experimental design objective based on the Gaussian-width of the underlying arm set. We provide a novel lower bound in terms of this objective that highlights its fundamental role in the sample complexity. The sample complexity of our fixed confidence algorithm matches this lower bound, and in addition is computationally efficient for combinatorial classes, e.g. shortest-path, matchings and matroids, where the arm sets can be exponentially large in the dimension. Finally, we propose the first algorithm for linear bandits in the the fixed budget setting. Its guarantee matches our lower bound up to logarithmic factors.
Author Information
Julian Katz-Samuels (University of Washington)
Lalit Jain (University of Washington)
zohar karnin (Amazon)
Kevin Jamieson (U Washington)
More from the Same Authors
-
2023 : LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 Poster: Optimal Exploration for Model-Based RL in Nonlinear Systems »
Andrew Wagenmaker · Guanya Shi · Kevin Jamieson -
2023 Poster: Active representation learning for general task space with applications in robotics »
Yifang Chen · Yingbing Huang · Simon Du · Kevin Jamieson · Guanya Shi -
2023 Poster: Experimental Designs for Heteroskedastic Variance »
Justin Weltz · Tanner Fiez · Alexander Volfovsky · Eric Laber · Blake Mason · houssam nassif · Lalit Jain -
2022 Poster: Active Learning with Safety Constraints »
Romain Camilleri · Andrew Wagenmaker · Jamie Morgenstern · Lalit Jain · Kevin Jamieson -
2022 Poster: Instance-optimal PAC Algorithms for Contextual Bandits »
Zhaoqi Li · Lillian Ratliff · houssam nassif · Kevin Jamieson · Lalit Jain -
2022 Poster: Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design »
Andrew Wagenmaker · Kevin Jamieson -
2021 : Beyond No Regret: Instance-Dependent PAC Reinforcement Learning »
Andrew Wagenmaker · Kevin Jamieson -
2021 Poster: Selective Sampling for Online Best-arm Identification »
Romain Camilleri · Zhihan Xiong · Maryam Fazel · Lalit Jain · Kevin Jamieson -
2021 Poster: Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers »
Julian Katz-Samuels · Blake Mason · Kevin Jamieson · Rob Nowak -
2021 Poster: Corruption Robust Active Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2020 Poster: Finding All $\epsilon$-Good Arms in Stochastic Bandits »
Blake Mason · Lalit Jain · Ardhendu Tripathy · Robert Nowak -
2019 Poster: A New Perspective on Pool-Based Active Classification and False-Discovery Control »
Lalit Jain · Kevin Jamieson -
2019 Poster: Sequential Experimental Design for Transductive Linear Bandits »
Lalit Jain · Kevin Jamieson · Tanner Fiez · Lillian Ratliff -
2019 Poster: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs »
Max Simchowitz · Kevin Jamieson -
2018 Poster: A Bandit Approach to Sequential Experimental Design with False Discovery Control »
Kevin Jamieson · Lalit Jain