Timezone: »
Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world, safety-critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real time. We then use those insights to create a new algorithm Real-Time Actor-Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor-Critic both in real-time and non-real-time settings.. Code and videos can be found at https://github.com/rmst/rtrl.
Author Information
Simon Ramstedt (Mila)
Chris Pal (Montreal Institute for Learning Algorithms, École Polytechnique, Université de Montréal)
More from the Same Authors
-
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization »
Alexandre Piche · Joseph Marino · Gian Maria Marconi · Valentin Thomas · Chris Pal · Mohammad Emtiyaz Khan -
2022 : Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models »
Vikram Voleti · Chris Pal · Adam Oberman -
2022 : Implicit Offline Reinforcement Learning via Supervised Learning »
Alexandre Piche · Rafael Pardinas · David Vazquez · Igor Mordatch · Igor Mordatch · Chris Pal -
2022 : A General-Purpose Neural Architecture for Geospatial Systems »
Martin Weiss · Nasim Rahaman · Frederik Träuble · Francesco Locatello · Alexandre Lacoste · Yoshua Bengio · Erran Li Li · Chris Pal · Bernhard Schölkopf -
2022 Poster: Attention-based Neural Cellular Automata »
Mattie Tesfaldet · Derek Nowrouzezahrai · Chris Pal -
2022 Poster: Neural Attentive Circuits »
Martin Weiss · Nasim Rahaman · Francesco Locatello · Chris Pal · Yoshua Bengio · Bernhard Schölkopf · Erran Li Li · Nicolas Ballas -
2022 Poster: MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation »
Vikram Voleti · Alexia Jolicoeur-Martineau · Chris Pal -
2020 Poster: Measuring Systematic Generalization in Neural Proof Generation with Transformers »
Nicolas Gontier · Koustuv Sinha · Siva Reddy · Chris Pal