Timezone: »
Poster
Cost-Sensitive Exploration in Bayesian Reinforcement Learning
Dongho Kim · Kee-Eung Kim · Pascal Poupart
Thu Dec 06 02:00 PM -- 12:00 AM (PST) @ Harrah’s Special Events Center 2nd Floor
In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected long-term total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems.
Author Information
Dongho Kim (PROWLER.io Limited)
Kee-Eung Kim (KAIST)
Pascal Poupart (University of Waterloo)
More from the Same Authors
-
2018 Poster: A Bayesian Approach to Generative Adversarial Imitation Learning »
Wonseok Jeon · Seokin Seo · Kee-Eung Kim -
2018 Spotlight: A Bayesian Approach to Generative Adversarial Imitation Learning »
Wonseok Jeon · Seokin Seo · Kee-Eung Kim -
2018 Poster: Monte-Carlo Tree Search for Constrained POMDPs »
Jongmin Lee · Geon-Hyeong Kim · Pascal Poupart · Kee-Eung Kim -
2017 Poster: Generative Local Metric Learning for Kernel Regression »
Yung-Kyun Noh · Masashi Sugiyama · Kee-Eung Kim · Frank Park · Daniel Lee -
2012 Poster: Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions »
Jaedeug Choi · Kee-Eung Kim -
2012 Demonstration: The BUDS POMDP Spoken Dialogue System »
Martin S · Matt Henderson · Catherine Breslin · Milica Gasic · Dongho Kim · Blaise Thomson · Pirros Tsiakoulis · Steve Young -
2012 Poster: Symbolic Dynamic Programming for Continuous State and Observation POMDPs »
Zahra Zamani · Scott Sanner · Pascal Poupart · Kristian Kersting -
2011 Poster: Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints »
Omar Z Khan · Pascal Poupart · John Agosta -
2011 Poster: MAP Inference for Bayesian Inverse Reinforcement Learning »
Jaedeug Choi · Kee-Eung Kim -
2010 Workshop: Machine Learning for Assistive Technologies »
Jesse Hoey · Pascal Poupart · Thomas Ploetz -
2010 Session: Spotlights Session 8 »
Pascal Poupart -
2010 Session: Oral Session 9 »
Pascal Poupart -
2009 Mini Symposium: Partially Observable Reinforcement Learning »
Marcus Hutter · Will Uther · Pascal Poupart -
2008 Workshop: Model Uncertainty and Risk in Reinforcement Learning »
Yaakov Engel · Mohammad Ghavamzadeh · Shie Mannor · Pascal Poupart -
2006 Poster: Automated Hierarchy Discovery for Planning in Partially Observable Domains »
Laurent Charlin · Pascal Poupart · Romy Shioda