Poster
How Does Variance Shape the Regret in Contextual Bandits?
Zeyu Jia · Jian Qian · Alexander Rakhlin · Chen-Yu Wei
West Ballroom A-D #6705
We study realizable contextual bandits with general function approximation, a setting where the worst-case optimal regret bound is well-understood, but refined data-dependent regret bounds are less explored. In this work, we investigate how to leverage the small variance of the reward to achieve improved regret bounds beyond the worst case. We show that, unlike in the worst-case regret bound, the eluder dimension plays a crucial role in the variance-dependent regret bound. We derive new lower bounds that demonstrate the role of the eluder dimension, and matching upper bounds in special cases where either the variance in each round is known to the learner, or the function class provides distributional information.
Live content is unavailable. Log in and register to view live content