Skip to yearly menu bar Skip to main content


Poster

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Ruitu Xu · Yifei Min · Tianhao Wang

Great Hall & Hall B1+B2 (level 1) #1805

Abstract: Linear contextual bandits represent a fundamental class of models with numerous real-world applications, and it is critical to develop algorithms that can effectively manage noise with unknown variance, ensuring provable guarantees for both worst-case constant-variance noise and deterministic reward scenarios. In this paper, we study linear contextual bandits with heteroscedastic noise and propose the first noise-adaptive Thompson sampling-style algorithm that achieves a variance-dependent regret upper bound of O~(d3/2+d3/2t=1Tσt2), where d is the dimension of the context vectors and σt2 is the variance of the reward in round t. This recovers the existing O~(d3/2T) regret guarantee in the constant-variance regime and further improves to O~(d3/2) in the deterministic regime, thus achieving a smooth interpolation in between. Our approach utilizes a stratified sampling procedure to overcome the too-conservative optimism in the linear Thompson sampling algorithm for linear contextual bandits.

Chat is not available.