Skip to yearly menu bar Skip to main content


Poster

Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Thomas Bonald · Alexandre Proutiere

Harrah's Special Events Center, 2nd Floor

Abstract: We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over [0,1]. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and the first m failures, respectively, where m is a fixed parameter. This two-target algorithm achieves a long-term average regret in 2n for a large parameter m and a known time horizon n. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in 2n. The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.

Live content is unavailable. Log in and register to view live content