Timezone: »

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
Morteza Ibrahimi · Adel Javanmard · Benjamin Van Roy

Mon Dec 03 07:00 PM -- 12:00 AM (PST) @ Harrah’s Special Events Center 2nd Floor
We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, an asymptotic regret bound of $\tilde{O}(\sqrt{T})$ was shown for $T \gg p$ where $p$ is the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that for $p \gg 1$ and $T \gg \polylog(p)$ achieves a regret bound of $\tilde{O}(p \sqrt{T})$. In particular, our algorithm has an average cost of $(1+\eps)$ times the optimum cost after $T = \polylog(p) O(1/\eps^2)$. This is in comparison to previous work on the dense dynamics where the algorithm needs $\Omega(p)$ samples before it can estimate the unknown dynamic with any significant accuracy. We believe our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks.

Author Information

Morteza Ibrahimi (Google)
Adel Javanmard (Stanford University)
Benjamin Van Roy (Stanford University)

More from the Same Authors