Skip to yearly menu bar Skip to main content


Efficient Batched Algorithm for Contextual Linear Bandits with Large Action Space via Soft Elimination

Osama Hanna · Lin Yang · Christina Fragouli

Great Hall & Hall B1+B2 (level 1) #1813
[ ]
Wed 13 Dec 8:45 a.m. PST — 10:45 a.m. PST

Abstract: In this paper, we provide the first efficient batched algorithm for contextual linear bandits with large action spaces. Unlike existing batched algorithms that rely on action elimination, which are not implementable for large action sets, our algorithm only uses a linear optimization oracle over the action set to design the policy. The proposed algorithm achieves a regret upper bound $\tilde{O}(\sqrt{T})$ with high probability, and uses $O(\log\log T)$ batches, matching the lower bound on the number of batches (Gao et al., 2019). When specialized to linear bandits, our algorithm can achieve a high probability gap-dependent regret bound of $\tilde{O}(1/\Delta_{\min})$ with the optimal $\log T$ number of batches, where $\Delta_{\min}$ is the minimum reward gap between a suboptimal arm and the optimal. Our result is achieved via a novel soft elimination approach, that entails $\text{``}$shaping$\text{"}$ the action sets at each batch so that we can efficiently identify (near) optimal actions.

Chat is not available.