Timezone: »
How humans make repeated choices among options with imperfectly known reward outcomes is an important problem in psychology and neuroscience. This is often studied using multi-armed bandits, which is also frequently studied in machine learning. We present data from a human stationary bandit experiment, in which we vary the average abundance and variability of reward availability (mean and variance of reward rate distributions). Surprisingly, we find subjects significantly underestimate prior mean of reward rates -- based on their self-report, at the end of a game, on their reward expectation of non-chosen arms. Previously, human learning in the bandit task was found to be well captured by a Bayesian ideal learning model, the Dynamic Belief Model (DBM), albeit under an incorrect generative assumption of the temporal structure - humans assume reward rates can change over time even though they are actually fixed. We find that the "pessimism bias" in the bandit task is well captured by the prior mean of DBM when fitted to human choices; but it is poorly captured by the prior mean of the Fixed Belief Model (FBM), an alternative Bayesian model that (correctly) assumes reward rates to be constants. This pessimism bias is also incompletely captured by a simple reinforcement learning model (RL) commonly used in neuroscience and psychology, in terms of fitted initial Q-values. While it seems sub-optimal, and thus mysterious, that humans have an underestimated prior reward expectation, our simulations show that an underestimated prior mean helps to maximize long-term gain, if the observer assumes volatility when reward rates are stable and utilizes a softmax decision policy instead of the optimal one (obtainable by dynamic programming). This raises the intriguing possibility that the brain underestimates reward rates to compensate for the incorrect non-stationarity assumption in the generative model and a simplified decision policy.
Author Information
Dalin Guo (UC San Diego)
Angela Yu (UC San Diego)
More from the Same Authors
-
2021 : Panel I: Human decisions »
Jennifer Trueblood · Alex Peysakhovich · Angela Yu · Ori Plonsky · Tal Yarkoni · Daniel Bjorkegren -
2019 : Panel Discussion led by Grace Lindsay »
Grace Lindsay · Blake Richards · Doina Precup · Jacqueline Gottlieb · Jeff Clune · Jane Wang · Richard Sutton · Angela Yu · Ida Momennejad -
2019 : Invited Talk #6: Features or Bugs: Synergistic Idiosyncrasies in Human Learning and Decision-Making »
Angela Yu -
2018 Poster: Demystifying excessively volatile human learning: A Bayesian persistent prior and a neural approximation »
Chaitanya Ryali · Gautam Reddy · Angela Yu -
2018 Poster: Beauty-in-averageness and its contextual modulations: A Bayesian statistical account »
Chaitanya Ryali · Angela Yu -
2017 : Computational modeling of human face processing »
Angela Yu -
2017 : Workshop overview »
Michael Mozer · Angela Yu · Brenden Lake -
2017 Workshop: Cognitively Informed Artificial Intelligence: Insights From Natural Intelligence »
Michael Mozer · Brenden Lake · Angela Yu -
2013 Poster: Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting »
Shunan Zhang · Angela Yu -
2013 Poster: Context-sensitive active sensing in humans »
Sheeraz Ahmad · He Huang · Angela Yu -
2012 Poster: Strategic Impatience in Go/NoGo versus Forced-Choice Decision-Making »
Pradeep Shenoy · Angela Yu -
2012 Oral: Strategic Impatience in Go/NoGo versus Forced-Choice Decision-Making »
Pradeep Shenoy · Angela Yu -
2010 Oral: A rational decision making framework for inhibitory control »
Pradeep Shenoy · Rajesh PN Rao · Angela Yu -
2010 Poster: A rational decision making framework for inhibitory control »
Pradeep Shenoy · Rajesh PN Rao · Angela Yu -
2008 Poster: Sequential effects: Superstition or rational behavior? »
Angela Yu · Jonathan D Cohen -
2008 Spotlight: Sequential effects: Superstition or rational behavior? »
Angela Yu · Jonathan D Cohen -
2007 Spotlight: Sequential Hypothesis Testing under Stochastic Deadlines »
Peter Frazier · Angela Yu -
2007 Poster: Sequential Hypothesis Testing under Stochastic Deadlines »
Peter Frazier · Angela Yu -
2006 Poster: Optimal Change-Detection and Spiking Neurons »
Angela Yu