Timezone: »

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Andrea Tirinzoni · Matteo Papini · Ahmed Touati · Alessandro Lazaric · Matteo Pirotta

Tue Nov 29 09:00 AM -- 11:00 AM (PST) @ Hall J #712

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find \textit{realizable} representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called \textit{HLS}) may be more effective for the exploration-exploitation task, enabling \textit{LinUCB} to achieve constant (i.e., horizon-independent) regret. In this paper, we propose \textsc{BanditSRL}, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that \textsc{BanditSRL} can be paired with any no-regret algorithm and achieve constant regret whenever an \textit{HLS} representation is available. Furthermore, \textsc{BanditSRL} can be easily combined with deep neural networks and we show how regularizing towards \textit{HLS} representations is beneficial in standard benchmarks.

Author Information

Andrea Tirinzoni (Meta AI)
Matteo Papini (Universitat Pompeu Fabra)
Ahmed Touati (Facebook)
Alessandro Lazaric (Facebook Artificial Intelligence Research)
Matteo Pirotta (META)

More from the Same Authors