Timezone: »

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan

Mon Dec 07 09:00 PM -- 11:00 PM (PST) @ Poster Session 0 #170
The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural networks, and indeed there have been many empirical successes that have exploited such combinations in large-scale applications. There are profound challenges, however, in developing a theory to support this enterprise, most notably the need to take into consideration the exploration-exploitation tradeoff at the core of RL in conjunction with the computational and statistical tradeoffs that arise in modern function-approximation-based learning systems. We approach these challenges by studying an optimistic modification of the least-squares value iteration algorithm, in the context of the action-value function represented by a kernel function or an overparameterized neural network. We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an $\tilde{\mathcal{O}}(\delta_{\cF} H^2 \sqrt{T})$ regret, where $\delta_{\cF}$ characterizes the intrinsic complexity of the function class $\cF$, $H$ is the length of each episode, and $T$ is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL.

Author Information

Zhuoran Yang (Princeton)
Chi Jin (Princeton University)
Zhaoran Wang (Northwestern University)
Mengdi Wang (Princeton University)

Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years).

Michael Jordan (UC Berkeley)

More from the Same Authors