Skip to yearly menu bar Skip to main content


Spotlight Poster

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

Raymond Zhang · Richard Combes

West Ballroom A-D #6407
[ ] [ Project Page ]
Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

We consider Thompson Sampling (TS) for linear combinatorial semi-bandits and subgaussian rewards. We propose the first known TS whose finite-time regret does not scale exponentially with the dimension of the problem. We further show the “mismatched sampling paradox”: A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse than a learner who does not know the rewards and simply samples from a well-chosen Gaussian posterior.

Live content is unavailable. Log in and register to view live content