Workshop: A causal view on dynamical systems

Online Learning of Optimal Control Signals in Stochastic Linear Dynamical Systems

Mohamad Kazem Shirani Faradonbeh


Among the most canonical systems are linear time-invariant dynamics governed by differential equations and stochastic disturbances. An interesting problem in this class of systems is learning to minimize a quadratic cost function when system matrices are unknown. This work initiates theoretical analysis of implementable reinforcement learning policies for balancing exploration versus exploitation in such systems. We present an online policy that learns the optimal control actions fast by carefully randomizing the parameter estimates to explore. More precisely, we establish performance guarantees for the presented policy showing that the regret grows as the \emph{square-root of time} multiplied by the \emph{number of parameters}. Implementation of the policy for a flight control task shows its efficacy. Further, we prove tight results that ensure stability under inexact system matrices and fully specify unavoidable performance degradations caused by a non-optimal policy. To obtain the results, we conduct a novel analysis for matrix perturbation, bound comparative ratios of stochastic integrals, and introduce the new method of policy differentiation. These technical novelties are trusted to provide a useful cornerstone for continuous-time reinforcement learning.

Chat is not available.