Skip to yearly menu bar Skip to main content


Poster

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah · Dogyoon Song · Zhi Xu · Yuzhe Yang

Poster Session 1 #559

Abstract: We consider the question of learning QQ-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If QQ-function is Lipschitz continuous, then the minimal sample complexity for estimating ϵϵ-optimal QQ-function is known to scale as Ω(1ϵd1+d2+2)Ω(1ϵd1+d2+2) per classical non-parametric learning theory, where d1d1 and d2d2 denote the dimensions of the state and action spaces respectively. The QQ-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of QQ-functions parameterized by its "rank" rr, which contains all Lipschitz QQ-functions as rr. As our key contribution, we develop a simple, iterative learning algorithm that finds ϵϵ-optimal QQ-function with sample complexity of ˜O(1ϵmax(d1,d2)+2)˜O(1ϵmax(d1,d2)+2) when the optimal QQ-function has low rank rr and the discounting factor γγ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.

Chat is not available.