Workshop: Causal Machine Learning for Real-World Impact

Non-Stationary Causal Bandits



The causal bandit problem is an extension of the conventional multi-armed bandit problem in which the arms available are not independent of each other, but rather are correlated within themselves in a Bayesian graph. This extension is more natural, since day-to-day cases of bandits often have a causal relation between their actions and hence are better represented as a causal bandit problem. Moreover, the class of conventional multi-armed bandits lies within that of causal bandits, since any instance of the former can be modeled in the latter setting by using a Bayesian graph with all independent variables. However, it is generally assumed that the probabilistic distributions in the Bayesian graph are stationary.In this paper, we design non-stationary causal bandit algorithms by equipping the actual state of the art (mainly \algo{causal UCB}, \algo{causal Thompson Sampling}, \algo{causal KL UCB} and \algo{Online Causal TS}) with the restarted Bayesian online change-point detector \cite{RBOCPD}. Experimental results show the minimization of the regret when using optimal change-point detection.

Chat is not available.