Timezone: »

Incremental Natural Actor-Critic Algorithms
Shalabh Bhatnagar · Richard Sutton · Mohammad Ghavamzadeh · Mark P Lee

Tue Dec 04 05:20 PM -- 05:30 PM (PST) @

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradient in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces, and the use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the policy gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda et al.\ by using temporal difference learning in the actor and by incorporating natural gradients, and extend prior empirical studies of natural-gradient actor-critic methods by Peters et al.\ by providing the first convergence proofs and the first fully incremental algorithms.

Author Information

Shalabh Bhatnagar (Indian Institute of Science)
Richard Sutton (DeepMind, U Alberta)

Richard S. Sutton is a professor and iCORE chair in the department of computing science at the University of Alberta. He is a fellow of the Association for the Advancement of Artificial Intelligence and co-author of the textbook "Reinforcement Learning: An Introduction" from MIT Press. Before joining the University of Alberta in 2003, he worked in industry at AT&T and GTE Labs, and in academia at the University of Massachusetts. He received a PhD in computer science from the University of Massachusetts in 1984 and a BA in psychology from Stanford University in 1978. Rich's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is also interested in animal learning psychology, in connectionist networks, and generally in systems that continually improve their representations and models of the world.

Mohammad Ghavamzadeh (Facebook AI Research)
Mark P Lee (University of Alberta)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors