Timezone: »

Environmental statistics and the trade-off between model-based and TD learning in humans
Dylan A Simon · Nathaniel D Daw

Tue Dec 13 08:45 AM -- 02:59 PM (PST) @

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence -- especially in humans -- as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rule-based vs. incremental learning.

Author Information

Dylan A Simon (New York University)
Nathaniel D Daw (New York University)

Nathaniel Daw is Assistant Professor of Neural Science and Psychology and Affiliated Assistant Professor of Computer Science at New York University. Prior to this he completed his PhD in Computer Science at Carnegie Mellon University and pursued postdoctoral research at the Gatsby Computational Neuroscience Unit at UCL. His research concerns reinforcement learning and decision making from a computational approach, and particularly the application of computational models to the analysis of behavioral and neural data. He is the recipient of a McKnight Scholar Award, a NARSAD Young Investigator Award, and a Royal Society USA Research Fellowship.

More from the Same Authors