Timezone: »
Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton–Jacobi–Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.
Author Information
Matthias Schultheis (Technische Universität Darmstadt)
Constantin Rothkopf (TU Darmstadt)
Heinz Koeppl (Technische Universität Darmstadt)
More from the Same Authors
-
2021 Spotlight: Variational Inference for Continuous-Time Switching Dynamical Systems »
Lukas Köhs · Bastian Alt · Heinz Koeppl -
2022 Poster: Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains »
Nicolai Engelmann · Heinz Koeppl -
2021 Poster: Variational Inference for Continuous-Time Switching Dynamical Systems »
Lukas Köhs · Bastian Alt · Heinz Koeppl -
2021 Poster: Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System »
Matthias Schultheis · Dominik Straub · Constantin Rothkopf -
2020 Poster: POMDPs in Continuous Time and Discrete Spaces »
Bastian Alt · Matthias Schultheis · Heinz Koeppl -
2019 Poster: Scalable Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data »
Dominik Linzner · Michael Schmidt · Heinz Koeppl -
2019 Poster: Correlation Priors for Reinforcement Learning »
Bastian Alt · Adrian Šošić · Heinz Koeppl -
2018 Poster: Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data »
Dominik Linzner · Heinz Koeppl -
2016 Poster: Catching heuristics are optimal control policies »
Boris Belousov · Gerhard Neumann · Constantin Rothkopf · Jan Peters