Slate recommendation systems are commonly evaluated prior to deployment using off-policy evaluation methods, whereby data collected under the old logging policy is used to predict the performance of a new target policy. However, in practice most recommendation systems are not observed to recommend the vast majority of items, which is an issue since existing methods require that the probability of the target policy recommending an item can only be non-zero when the probability of the logging policy is non-zero. To circumvent this issue, we explore the use of item embeddings. By representing queries and slates in an embedding space, we are able to share information to extrapolate behaviors for queries and items that have not been seen yet.
Jaron Jia Rong Lee (Johns Hopkins University)
David Arbour (Adobe Research)
Georgios Theocharous (Adobe Research)
More from the Same Authors
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris