Skip to yearly menu bar Skip to main content


Bandit Learning with Implicit Feedback

Yi Qi · Qingyun Wu · Hongning Wang · Jie Tang · Maosong Sun

Room 517 AB #156

Keywords: [ Bandit Algorithms ] [ Variational Inference ] [ Recommender Systems ]


Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.

Live content is unavailable. Log in and register to view live content