Timezone: »
Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration inequalities, which thus lead to over-exploration. In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap. To improve its finite sample performance, we further incorporate second-order correction into the above construction. In theory, we derive both problem-dependent and problem-independent regret bounds for multi-armed bandits under a much weaker tail assumption than the standard sub-Gaussianity. Numerical results demonstrate significant regret reductions by our method, in comparison with several baselines in a range of multi-armed and linear bandit problems.
Author Information
Botao Hao (Purdue University)
Yasin Abbasi Yadkori (VinAI Research/ VinTech JSC.,)
Zheng Wen (DeepMind)
Guang Cheng (Purdue University)
More from the Same Authors
-
2021 Spotlight: Information Directed Sampling for Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Wei Deng -
2021 : Optimum-statistical Collaboration Towards Efficient Black-boxOptimization »
Wenjie Li · Chi-Hua Wang · Guang Cheng -
2022 Poster: Fair Bayes-Optimal Classifiers Under Predictive Parity »
Xianli Zeng · Edgar Dobriban · Guang Cheng -
2022 Spotlight: An Analysis of Ensemble Sampling »
Chao Qin · Zheng Wen · Xiuyuan Lu · Benjamin Van Roy -
2022 Poster: Why Do Artificially Generated Data Help Adversarial Robustness »
Yue Xing · Qifan Song · Guang Cheng -
2022 Poster: Phase Transition from Clean Training to Adversarial Training »
Yue Xing · Qifan Song · Guang Cheng -
2022 Poster: An Analysis of Ensemble Sampling »
Chao Qin · Zheng Wen · Xiuyuan Lu · Benjamin Van Roy -
2022 Poster: The Neural Testbed: Evaluating Joint Predictions »
Ian Osband · Zheng Wen · Seyed Mohammad Asghari · Vikranth Dwaracherla · Xiuyuan Lu · MORTEZA IBRAHIMI · Dieterich Lawson · Botao Hao · Brendan O'Donoghue · Benjamin Van Roy -
2022 Poster: Regret Bounds for Information-Directed Reinforcement Learning »
Botao Hao · Tor Lattimore -
2021 Poster: On the Algorithmic Stability of Adversarial Training »
Yue Xing · Qifan Song · Guang Cheng -
2021 Poster: Information Directed Sampling for Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Wei Deng -
2021 Poster: Bandit Phase Retrieval »
Tor Lattimore · Botao Hao -
2020 Poster: Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee »
Jincheng Bai · Qifan Song · Guang Cheng -
2020 Poster: High-Dimensional Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Mengdi Wang -
2020 Poster: Statistical Guarantees of Distributed Nearest Neighbor Classification »
Jiexin Duan · Xingye Qiao · Guang Cheng -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Poster: Directional Pruning of Deep Neural Networks »
Shih-Kang Chao · Zhanyu Wang · Yue Xing · Guang Cheng -
2019 Poster: Thompson Sampling and Approximate Inference »
My Phan · Yasin Abbasi Yadkori · Justin Domke -
2019 Poster: Rates of Convergence for Large-scale Nearest Neighbor Classification »
Xingye Qiao · Jiexin Duan · Guang Cheng -
2018 Poster: Scalar Posterior Sampling with Applications »
Georgios Theocharous · Zheng Wen · Yasin Abbasi Yadkori · Nikos Vlassis -
2018 Poster: Early Stopping for Nonparametric Testing »
Meimei Liu · Guang Cheng -
2017 Poster: Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem »
Yasin Abbasi Yadkori · Peter Bartlett · Victor Gabillon -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization »
Joseph Jay Williams · Yasin Abbasi Yadkori · Finale Doshi-Velez -
2015 Poster: Non-convex Statistical Optimization for Sparse Tensor Graphical Model »
Wei Sun · Zhaoran Wang · Han Liu · Guang Cheng -
2015 Poster: Minimax Time Series Prediction »
Wouter Koolen · Alan Malek · Peter Bartlett · Yasin Abbasi Yadkori -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2013 Workshop: Resource-Efficient Machine Learning »
Yevgeny Seldin · Yasin Abbasi Yadkori · Yacov Crammer · Ralf Herbrich · Peter Bartlett -
2013 Poster: Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions »
Yasin Abbasi Yadkori · Peter Bartlett · Varun Kanade · Yevgeny Seldin · Csaba Szepesvari -
2011 Poster: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2011 Spotlight: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari