Timezone: »
Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we introduce the Laplace code: a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three separated dimensions: reward magnitude (related to distributional quantiles), temporal discounting (related to the Laplace transform of future rewards) and time horizon (related to eligibility traces). Besides lending itself to a local learning rule, the decomposition recovers the temporal evolution of the immediate reward distribution, indicating all possible rewards at all future times. This increases representational capacity and allows for temporally-flexible computations that immediately adjust to changing horizons or discount factors.
Author Information
Pablo Tano (University of Geneva)
Peter Dayan (Max Planck Institute for Biological Cybernetics)
Alexandre Pouget (University of Geneva)
More from the Same Authors
-
2021 Spotlight: Two steps to risk sensitivity »
Christopher Gagne · Peter Dayan -
2021 : Catastrophe, Compounding & Consistency in Choice »
Christopher Gagne · Peter Dayan -
2022 : A (dis-)information theory of revealed and unrevealed preferences »
Nitay Alon · Lion Schulz · Peter Dayan · Jeffrey S Rosenschein -
2023 : Multi-timescale reinforcement learning in the brain »
Paul Masset · Pablo Tano · HyungGoo Kim · Athar Malik · Alexandre Pouget · Naoshige Uchida -
2023 : Multi-timescale reinforcement learning in the brain »
Paul Masset · Pablo Tano · HyungGoo Kim · Athar Malik · Alexandre Pouget · Naoshige Uchida -
2023 : Cognitive Information Filters: Algorithmic Choice Architecture for Boundedly Rational Choosers »
Stefan Bucher · Peter Dayan -
2023 : Cognitive Information Filters: Algorithmic Choice Architecture for Boundedly Rational Choosers »
Stefan Bucher · Peter Dayan -
2023 Poster: Reinforcement Learning with Simple Sequence Priors »
Tankred Saanum · Noemi Elteto · Peter Dayan · Marcel Binz · Eric Schulz -
2022 : A (dis-)information theory of revealed and unrevealed preferences »
Nitay Alon · Lion Schulz · Peter Dayan · Jeffrey S Rosenschein -
2021 Poster: Two steps to risk sensitivity »
Christopher Gagne · Peter Dayan -
2020 : Panel Discussions »
Grace Lindsay · George Konidaris · Shakir Mohamed · Kimberly Stachenfeld · Peter Dayan · Yael Niv · Doina Precup · Catherine Hartley · Ishita Dasgupta -
2020 Poster: Dynamic allocation of limited memory resources in reinforcement learning »
Nisheet Patel · Luigi Acerbi · Alexandre Pouget -
2019 Poster: Disentangled behavioural representations »
Amir Dezfouli · Hassan Ashtiani · Omar Ghattas · Richard Nock · Peter Dayan · Cheng Soon Ong -
2014 Poster: Optimal decision-making with time-varying evidence reliability »
Jan Drugowitsch · Ruben Moreno-Bote · Alexandre Pouget -
2014 Spotlight: Optimal decision-making with time-varying evidence reliability »
Jan Drugowitsch · Ruben Moreno-Bote · Alexandre Pouget -
2013 Poster: Demixing odors - fast inference in olfaction »
Agnieszka Grabska-Barwinska · Jeff Beck · Alexandre Pouget · Peter E Latham -
2013 Spotlight: Demixing odors - fast inference in olfaction »
Agnieszka Grabska-Barwinska · Jeff Beck · Alexandre Pouget · Peter E Latham