Skip to yearly menu bar Skip to main content


Poster
in
Workshop: A causal view on dynamical systems

Causal Bandits: Online Decision-Making in Endogenous Settings

Jingwen Zhang · Yifang Chen · Amandeep Singh


Abstract: The deployment of Multi-Armed Bandits (MAB) has become commonplace in many economic applications. However, regret guarantees for even state-of-the-art linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear bandit (OFUL)) make strong exogeneity assumptions w.r.t. arm covariates. This assumption is very often violated in many economic contexts and using such algorithms can lead to sub-optimal decisions. In this paper, we consider the problem of online learning in linear stochastic multi-armed bandit problems with endogenous covariates. We propose an algorithm we term BanditIV, that uses instrumental variables to correct for this bias, and prove an $\tilde{\mathcal{O}}(k\sqrt{T})$ upper bound for the expected regret of the algorithm. Further, in economic contexts, it is also important to understand how the model parameters behave asymptotically. To this end, we additionally propose $\epsilon$-BanditIV algorithm and demonstrate its asymptotic consistency and normality while ensuring the same regret bound. Finally, we carry out extensive Monte Carlo simulations to demonstrate the performance of our algorithms compared to other methods. We show that BanditIV and $\epsilon$-BanditIV significantly outperform other existing methods.

Chat is not available.