Skip to yearly menu bar Skip to main content

Workshop: Human in the Loop Learning (HiLL) Workshop at NeurIPS 2022

"I pick you choose": Joint human-algorithm decision making in multi-armed bandits

Kate Donahue · Sreenivas Gollapudi · Kostas Kollias


Online learning in multi-armed bandits has been a rich area of research for decades, resulting in numerous \enquote{no-regret} algorithms that efficiently learn the arm with highest expected reward. However, in many settings the final decision of which arm to pull isn't under the control of the algorithm itself. For example, a driving app typically suggests a subset of routes (arms) to the driver, who ultimately makes the final choice about which to select. Typically, the human also wishes to learn the optimal arm based on historical reward information, but decides which arm to pull based on a potentially different objective function, such as being more or less myopic about exploiting near-term rewards. In this paper, we show when this joint human-algorithm system can achieve good performance. Specifically, we explore multiple possible frameworks for human objectives and give theoretical regret bounds for regret. Finally, we include experimental results exploring how regret varies with the human decision-maker's objective, as well as the number of arms presented.

Chat is not available.