NeurIPS 2020 : Online Learning in Contextual Bandits using Gated Linear Networks



Online Learning in Contextual Bandits using Gated Linear Networks

Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness

Poster Session 3 (more posters)
on Wed, Dec 9th, 2020 @ 05:00 – 07:00 GMT

Toggle Abstract Paper (in Proceedings / .pdf)

Abstract: We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains mean first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.

Online Learning in Contextual Bandits using Gated Linear Networks

Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.