Timezone: »

 
Poster
Weighted importance sampling for off-policy learning with linear function approximation
Rupam Mahmood · Hado P van Hasselt · Richard Sutton

Mon Dec 08 04:00 PM -- 08:59 PM (PST) @ Level 2, room 210D

Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, \emph{weighted} importance sampling, does not carry over easily to function approximation and, because of this, it is not utilized in existing off-policy learning algorithms. In this paper, we take two steps toward bridging this gap. First, we show that weighted importance sampling can be viewed as a special case of weighting the error of individual training samples, and that this weighting has theoretical and empirical benefits similar to those of weighted importance sampling. Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(lambda). We show empirically that our new WIS-LSTD(lambda) algorithm can result in much more rapid and reliable convergence than conventional off-policy LSTD(lambda) (Yu 2010, Bertsekas & Yu 2009).

Author Information

Rupam Mahmood (University of Alberta)
Hado P van Hasselt (DeepMind)
Richard Sutton (DeepMind, U Alberta)

Richard S. Sutton is a professor and iCORE chair in the department of computing science at the University of Alberta. He is a fellow of the Association for the Advancement of Artificial Intelligence and co-author of the textbook "Reinforcement Learning: An Introduction" from MIT Press. Before joining the University of Alberta in 2003, he worked in industry at AT&T and GTE Labs, and in academia at the University of Massachusetts. He received a PhD in computer science from the University of Massachusetts in 1984 and a BA in psychology from Stanford University in 1978. Rich's research interests center on the learning problems facing a decision-maker interacting with its environment, which he sees as central to artificial intelligence. He is also interested in animal learning psychology, in connectionist networks, and generally in systems that continually improve their representations and models of the world.

More from the Same Authors