Timezone: »

Regret of Queueing Bandits
Subhashini Krishnasamy · Rajat Sen · Ramesh Johari · Sanjay Shakkottai

Mon Dec 05 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #8

We consider a variant of the multiarmed bandit problem where jobs queue for service, and service rates of different servers may be unknown. We study algorithms that minimize queue-regret: the (expected) difference between the queue-lengths obtained by the algorithm, and those obtained by a genie-aided matching algorithm that knows exact service rates. A naive view of this problem would suggest that queue-regret should grow logarithmically: since queue-regret cannot be larger than classical regret, results for the standard MAB problem give algorithms that ensure queue-regret increases no more than logarithmically in time. Our paper shows surprisingly more complex behavior. In particular, the naive intuition is correct as long as the bandit algorithm's queues have relatively long regenerative cycles: in this case queue-regret is similar to cumulative regret, and scales (essentially) logarithmically. However, we show that this "early stage" of the queueing bandit eventually gives way to a "late stage", where the optimal queue-regret scaling is O(1/t). We demonstrate an algorithm that (order-wise) achieves this asymptotic queue-regret, and also exhibits close to optimal switching time from the early stage to the late stage.

Author Information

Subhashini Krishnasamy (The University of Texas at Austin)
Rajat Sen (The University of Texas at Austin)
Ramesh Johari (Stanford University)
Sanjay Shakkottai (The University of Texas at Aus)

More from the Same Authors