NIPS Poster Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Poster

Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Thomas Bonald · Alexandre Proutiere

[ Abstract ]

2013 Poster

Abstract: We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over

[0, 1]

. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and the first

m

failures, respectively, where

m

is a fixed parameter. This two-target algorithm achieves a long-term average regret in

\sqrt{2 n}

for a large parameter

m

and a known time horizon

n

. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in

2 \sqrt{n}

. The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.

Live content is unavailable. Log in and register to view live content