Timezone: »
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains that require balancing actions that increase an agents knowledge and actions that increase an agents reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many realworld problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and explicitly models only visited states. We demonstrate the iPOMDP utility on several standard problems.
Author Information
Finale P Doshi-Velez (Harvard)
More from the Same Authors
-
2013 Workshop: Machine Learning for Clinical Data Analysis and Healthcare »
Jenna Wiens · Finale P Doshi-Velez · Can Ye · Madalina Fiterau · Shipeng Yu · Le Lu · Balaji R Krishnapuram -
2010 Poster: Nonparametric Bayesian Policy Priors for Reinforcement Learning »
Finale P Doshi-Velez · David Wingate · Nicholas Roy · Josh Tenenbaum -
2009 Poster: Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process »
Shakir Mohamed · David A Knowles · Zoubin Ghahramani · Finale P Doshi-Velez