Timezone: »
Spotlight
Regularized Off-Policy TD-Learning
Bo Liu · Sridhar Mahadevan · Ji Liu
Wed Dec 05 11:44 AM -- 11:48 AM (PST) @ Harveys Convention Center Floor, CC
We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.
Author Information
Bo Liu (Auburn University)
Sridhar Mahadevan (UMass Amherst)
Ji Liu (University Wisconsin-Madison)
Related Events (a corresponding poster, oral, or spotlight)
-
2012 Poster: Regularized Off-Policy TD-Learning »
Thu. Dec 6th through Wed the 5th Room Harrah’s Special Events Center 2nd Floor
More from the Same Authors
-
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Poster: Projected Natural Actor-Critic »
Philip Thomas · William C Dabney · Stephen Giguere · Sridhar Mahadevan -
2010 Poster: Multi-Stage Dantzig Selector »
Ji Liu · Peter Wonka · Jieping Ye -
2010 Poster: Basis Construction from Power Series Expansions of Value Functions »
Sridhar Mahadevan · Bo Liu