Timezone: »

Bandit Learning with Implicit Feedback
Yi Qi · Qingyun Wu · Hongning Wang · Jie Tang · Maosong Sun

Thu Dec 06 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #156

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.

Author Information

Yi Qi (Tsinghua University)
Qingyun Wu (University of Virginia)
Hongning Wang (University of Virginia)
Jie Tang (Tsinghua University)
Maosong Sun

More from the Same Authors