Skip to yearly menu bar Skip to main content

Workshop: Attributing Model Behavior at Scale (ATTRIB)

Risk Aversion of Online Learning Algorithms

Andreas Haupt · Aroon Narayanan

Abstract: We study a novel bias in online decision-making: Emergent risk aversion. When presented with actions of the same expectation, $\varepsilon$-Greedy chooses the lower-variance action with probability approaching one. Upper Confidence Band avoids this by debiasing their estimates of arm rewards. Risk aversion shapes arm choices in finite time, as we show in experiments.

Chat is not available.