Timezone: »
Poster
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits
Muhammad Faaiz Taufiq · Arnaud Doucet · Rob Cornish · Jean-Francois Ton
Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation. However, current OPE methods, such as Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators, suffer from high variance, particularly in cases of low overlap between target and behaviour policies or large action and context spaces. In this paper, we introduce a new OPE estimator for contextual bandits, the Marginal Ratio (MR) estimator, which focuses on the shift in the marginal distribution of outcomes $Y$ instead of the policies themselves. Through rigorous theoretical analysis, we demonstrate the benefits of the MR estimator compared to conventional methods like IPW and DR in terms of variance reduction. Additionally, we establish a connection between the MR estimator and the state-of-the-art Marginalized Inverse Propensity Score (MIPS) estimator, proving that MR achieves lower variance among a generalized family of MIPS estimators. We further illustrate the utility of the MR estimator in causal inference settings, where it exhibits enhanced performance in estimating Average Treatment Effects (ATE). Our experiments on synthetic and real-world datasets corroborate our theoretical findings and highlight the practical advantages of the MR estimator in OPE for contextual bandits.
Author Information
Muhammad Faaiz Taufiq (University of Oxford)
Arnaud Doucet (Oxford)
Rob Cornish (University of Oxford)
Jean-Francois Ton (Bytedance)
More from the Same Authors
-
2022 : Spectral Diffusion Processes »
Angus Phillips · Thomas Seror · Michael Hutchinson · Valentin De Bortoli · Arnaud Doucet · Emile Mathieu -
2023 : Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment »
Yang Liu · Yuanshun (Kevin) Yao · Jean-Francois Ton · Xiaoying Zhang · Ruocheng Guo · Hao Cheng · Yegor Klochkov · Muhammad Faaiz Taufiq · Hang Li -
2023 Poster: Trans-Dimensional Generative Modeling via Jump Diffusion Models »
Andrew Campbell · William Harvey · Christian Weilbach · Valentin De Bortoli · Thomas Rainforth · Arnaud Doucet -
2023 Poster: Diffusion Schrödinger Bridge Matching »
Yuyang Shi · Valentin De Bortoli · Andrew Campbell · Arnaud Doucet -
2023 Poster: Invariant Learning via Probability of Sufficient and Necessary Causes »
Mengyue Yang · Yonggang Zhang · Zhen Fang · Yali Du · Furui Liu · Jean-Francois Ton · Jianhong Wang · Jun Wang -
2023 Poster: Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters »
Maxence Noble · Valentin De Bortoli · Arnaud Doucet · Alain Durmus -
2023 Poster: A Unified Framework for U-Net Design and Analysis »
Christopher Williams · Fabian Falck · George Deligiannidis · Chris C Holmes · Arnaud Doucet · Saifuddin Syed -
2023 Poster: Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics »
Kamélia Daudel · Joe Benton · Yuyang Shi · Arnaud Doucet -
2022 Spotlight: Lightning Talks 1A-4 »
Siwei Wang · Jing Liu · Nianqiao Ju · Shiqian Li · Eloïse Berthier · Muhammad Faaiz Taufiq · Arsene Fansi Tchango · Chen Liang · Chulin Xie · Jordan Awan · Jean-Francois Ton · Ziad Kobeissi · Wenguan Wang · Xinwang Liu · Kewen Wu · Rishab Goel · Jiaxu Miao · Suyuan Liu · Julien Martel · Ruobin Gong · Francis Bach · Chi Zhang · Rob Cornish · Sanmi Koyejo · Zhi Wen · Yee Whye Teh · Yi Yang · Jiaqi Jin · Bo Li · Yixin Zhu · Vinayak Rao · Wenxuan Tu · Gaetan Marceau Caron · Arnaud Doucet · Xinzhong Zhu · Joumana Ghosn · En Zhu -
2022 Spotlight: Conformal Off-Policy Prediction in Contextual Bandits »
Muhammad Faaiz Taufiq · Jean-Francois Ton · Rob Cornish · Yee Whye Teh · Arnaud Doucet -
2022 Poster: Conformal Off-Policy Prediction in Contextual Bandits »
Muhammad Faaiz Taufiq · Jean-Francois Ton · Rob Cornish · Yee Whye Teh · Arnaud Doucet -
2022 Poster: A Continuous Time Framework for Discrete Denoising Models »
Andrew Campbell · Joe Benton · Valentin De Bortoli · Thomas Rainforth · George Deligiannidis · Arnaud Doucet -
2022 Poster: Score-Based Diffusion meets Annealed Importance Sampling »
Arnaud Doucet · Will Grathwohl · Alexander Matthews · Heiko Strathmann -
2022 Poster: A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs »
Fabian Falck · Christopher Williams · Dominic Danks · George Deligiannidis · Christopher Yau · Chris C Holmes · Arnaud Doucet · Matthew Willetts -
2022 Poster: Riemannian Score-Based Generative Modelling »
Valentin De Bortoli · Emile Mathieu · Michael Hutchinson · James Thornton · Yee Whye Teh · Arnaud Doucet -
2022 Poster: Towards Learning Universal Hyperparameter Optimizers with Transformers »
Yutian Chen · Xingyou Song · Chansoo Lee · Zi Wang · Richard Zhang · David Dohan · Kazuya Kawakami · Greg Kochanski · Arnaud Doucet · Marc'Aurelio Ranzato · Sagi Perel · Nando de Freitas -
2019 Poster: Augmented Neural ODEs »
Emilien Dupont · Arnaud Doucet · Yee Whye Teh -
2018 Poster: Hamiltonian Variational Auto-Encoder »
Anthony Caterini · Arnaud Doucet · Dino Sejdinovic -
2017 Poster: Filtering Variational Objectives »
Chris Maddison · John Lawson · George Tucker · Nicolas Heess · Mohammad Norouzi · Andriy Mnih · Arnaud Doucet · Yee Teh -
2017 Poster: Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling »
Andrei-Cristian Barbos · Francois Caron · Jean-François Giovannelli · Arnaud Doucet -
2015 Workshop: Scalable Monte Carlo Methods for Bayesian Analysis of Big Data »
Babak Shahbaba · Yee Whye Teh · Max Welling · Arnaud Doucet · Christophe Andrieu · Sebastian J. Vollmer · Pierre Jacob -
2015 Poster: Expectation Particle Belief Propagation »
Thibaut Lienart · Yee Whye Teh · Arnaud Doucet -
2014 Poster: Asynchronous Anytime Sequential Monte Carlo »
Brooks Paige · Frank Wood · Arnaud Doucet · Yee Whye Teh -
2014 Oral: Asynchronous Anytime Sequential Monte Carlo »
Brooks Paige · Frank Wood · Arnaud Doucet · Yee Whye Teh -
2009 Poster: Bayesian Nonparametric Models on Decomposable Graphs »
Francois Caron · Arnaud Doucet -
2009 Tutorial: Sequential Monte-Carlo Methods »
Arnaud Doucet · Nando de Freitas -
2007 Spotlight: Bayesian Policy Learning with Trans-Dimensional MCMC »
Matthew Hoffman · Arnaud Doucet · Nando de Freitas · Ajay Jasra -
2007 Poster: Bayesian Policy Learning with Trans-Dimensional MCMC »
Matthew Hoffman · Arnaud Doucet · Nando de Freitas · Ajay Jasra