Skip to yearly menu bar Skip to main content


Poster

How Does Variance Shape the Regret in Contextual Bandits?

Zeyu Jia · Jian Qian · Alexander Rakhlin · Chen-Yu Wei

West Ballroom A-D #6705
[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

We study realizable contextual bandits with general function approximation, a setting where the worst-case optimal regret bound is well-understood, but refined data-dependent regret bounds are less explored. In this work, we investigate how to leverage the small variance of the reward to achieve improved regret bounds beyond the worst case. We show that, unlike in the worst-case regret bound, the eluder dimension plays a crucial role in the variance-dependent regret bound. We derive new lower bounds that demonstrate the role of the eluder dimension, and matching upper bounds in special cases where either the variance in each round is known to the learner, or the function class provides distributional information.

Live content is unavailable. Log in and register to view live content