Timezone: »
Poster
Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards
Thomas Bonald · Alexandre Proutiere
Sat Dec 07 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor
We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over $[0,1]$. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and the first $m$ failures, respectively, where $m$ is a fixed parameter. This two-target algorithm achieves a long-term average regret in $\sqrt{2n}$ for a large parameter $m$ and a known time horizon $n$. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in $2\sqrt{n}$. The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.
Author Information
Thomas Bonald (Telecom ParisTech)
Alexandre Proutiere (KTH)
More from the Same Authors
-
2021 Poster: Fast Pure Exploration via Frank-Wolfe »
Po-An Wang · Ruo-Chun Tzeng · Alexandre Proutiere -
2021 Poster: Navigating to the Best Policy in Markov Decision Processes »
Aymen Al Marjani · AurĂ©lien Garivier · Alexandre Proutiere -
2020 Poster: Regret in Online Recommendation Systems »
Kaito Ariu · Narae Ryu · Se-Young Yun · Alexandre Proutiere -
2020 Poster: Optimal Best-arm Identification in Linear Bandits »
Yassir Jedra · Alexandre Proutiere -
2019 Poster: Optimal Sampling and Clustering in the Stochastic Block Model »
Se-Young Yun · Alexandre Proutiere -
2018 Poster: Exploration in Structured Reinforcement Learning »
Jungseul Ok · Alexandre Proutiere · Damianos Tranos -
2018 Oral: Exploration in Structured Reinforcement Learning »
Jungseul Ok · Alexandre Proutiere · Damianos Tranos -
2017 Poster: Minimal Exploration in Structured Stochastic Bandits »
Richard Combes · Stefan Magureanu · Alexandre Proutiere -
2017 Spotlight: Minimal Exploration in Structured Stochastic Bandits »
Richard Combes · Stefan Magureanu · Alexandre Proutiere -
2017 Poster: A Minimax Optimal Algorithm for Crowdsourcing »
Thomas Bonald · Richard Combes -
2016 Poster: Optimal Cluster Recovery in the Labeled Stochastic Block Model »
Se-Young Yun · Alexandre Proutiere -
2015 Poster: Fast and Memory Optimal Low-Rank Matrix Approximation »
Se-Young Yun · marc lelarge · Alexandre Proutiere -
2015 Poster: Combinatorial Bandits Revisited »
Richard Combes · M. Sadegh Talebi · Alexandre Proutiere · marc lelarge -
2014 Poster: Streaming, Memory Limited Algorithms for Community Detection »
Se-Young Yun · marc lelarge · Alexandre Proutiere