`

Timezone: »

Poster
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
Pierre Perrault · Etienne Boursier · Michal Valko · Vianney Perchet

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #1019
We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate \emph{sub-Gaussian} family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.

#### Author Information

##### Michal Valko (DeepMind)

Michal is a machine learning scientist in DeepMind Paris, SequeL team at Inria, and the lecturer of the master course Graphs in Machine Learning at l'ENS Paris-Saclay. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimizing the data that humans need to spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as online learning, bandit algorithms, semi-supervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure leads to provably faster learning. Structured learning requires more time and space resources and therefore the most recent work of Michal includes efficient approximations such as graph and matrix sketching with learning guarantees. In past, the common thread of Michal's work has been adaptive graph-based learning and its application to real-world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his Ph.D. in 2011 from the University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos before taking a permanent position at Inria in 2012.