NeurIPS 2020 : Sub-sampling for Efficient Non-Parametric Bandit Exploration



Sub-sampling for Efficient Non-Parametric Bandit Exploration

Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

Spotlight presentation: Orals & Spotlights Track 14: Reinforcement Learning
on Wed, Dec 9th, 2020 @ 03:10 – 03:20 GMT

Poster Session 3 (more posters)
on Wed, Dec 9th, 2020 @ 05:00 – 07:00 GMT

Toggle Abstract Paper (in Proceedings / .pdf)

Abstract: In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA and SSMC algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.

Sub-sampling for Efficient Non-Parametric Bandit Exploration

Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.