NeurIPS 2019 Expo Workshop

Dec. 5, 2020

Expo 2019 Schedule »

Real world reinforcement learning with Vowpal Wabbit

Sponsor: Microsoft


Reinforcement learning is increasingly being used to solve real world personalization and optimization scenarios, with online, sample efficient algorithms such as Contextual Bandits. Companies such as Netflix ( and The New York Times ( are using Contextual Bandits to personalize content and optimize engagement. Across multiple deployments Microsoft uses Contextual Bandits, and recently released the Personalizer Azure Cognitive Service ( which is the world's first real world reinforcement learning service.

Vowpal Wabbit ( is an open source machine learning library, extensively used by industry, and is the first public terascale learning system ( It provides fast, scalable machine learning and has unique capabilities such as learning to search, active learning, contextual memory, and extreme multiclass learning. It has a focus on reinforcement learning and provides production ready implementations of Contextual Bandit algorithms. Vowpal Wabbit sees significant innovation as a research to production vehicle for Microsoft Research.

Come and learn about reinforcement learning, Vowpal Wabbit, and applying contextual bandits to problems using Vowpal Wabbit.