Timezone: »

Bounding Performance Loss in Approximate MDP Homomorphisms
Doina Precup · Jonathan Taylor Taylor · Prakash Panangaden

Mon Dec 08 08:45 PM -- 12:00 AM (PST) @

We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), in which action similarity is taken into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran \& Barto, 2003). We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than that provided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.

Author Information

Doina Precup (McGill University / Mila / DeepMind Montreal)
Jonathan Taylor Taylor (McGill University)
Prakash Panangaden (McGill University, Montreal)

More from the Same Authors