Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Poster
Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #325
Thompson Sampling Efficiently Learns to Control Diffusion Processes
Mohamad Kazem Shirani Faradonbeh · Mohamad Sadegh Shirani Faradonbeh · Mohsen Bayati
[ OpenReview

Diffusion processes that evolve according to linear stochastic differential equations are an important family of continuous-time dynamic decision-making models. Optimal policies are well-studied for them, under full certainty about the drift matrices. However, little is known about data-driven control of diffusion processes with uncertain drift matrices as conventional discrete-time analysis techniques are not applicable. In addition, while the task can be viewed as a reinforcement learning problem involving exploration and exploitation trade-off, ensuring system stability is a fundamental component of designing optimal policies. We establish that the popular Thompson sampling algorithm learns optimal actions fast, incurring only a square-root of time regret, and also stabilizes the system in a short time period. To the best of our knowledge, this is the first such result for Thompson sampling in a diffusion process control problem. We validate our theoretical results through empirical simulations with real matrices. Moreover, we observe that Thompson sampling significantly improves (worst-case) regret, compared to the state-of-the-art algorithms, suggesting Thompson sampling explores in a more guarded fashion. Our theoretical analysis involves characterization of a certain \emph{optimality manifold} that ties the local geometry of the drift parameters to the optimal control of the diffusion process. We expect this technique to be of broader interest.