Timezone: »
We study several classes of interactive assistants from the points of view of decision theory and computational complexity. We first introduce a class of POMDPs called hidden-goal MDPs (HGMDPs), which formalize the problem of interactively assisting an agent whose goal is hidden and whose actions are observable. In spite of its restricted nature, we show that optimal action selection in finite horizon HGMDPs is PSPACE-complete even in domains with deterministic dynamics. We then introduce a more restricted model called helper action MDPs (HAMDPs), where the assistant's action is accepted by the agent when it is helpful, and can be easily ignored by the agent otherwise. We show classes of HAMDPs that are complete for PSPACE and NP along with a polynomial time class. Furthermore, we show that for general HAMDPs a simple myopic policy achieves a regret, compared to an omniscient assistant, that is bounded by the entropy of the initial goal distribution. A variation of this policy is shown to achieve worst-case regret that is logarithmic in the number of goals for any goal distribution.
Author Information
Alan Fern (Oregon State University)
Prasad Tadepalli (Oregon State University)
More from the Same Authors
-
2021 Spotlight: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2021 : Deep RePReL--Combining Planning and Deep RL for acting in relational domains »
Harsha Kokel · Arjun Manoharan · Sriraam Natarajan · Balaraman Ravindran · Prasad Tadepalli -
2022 : Offline Policy Comparison with Confidence: Benchmarks and Baselines »
Anurag Koul · Mariano Phielipp · Alan Fern -
2022 Poster: Parametrically Retargetable Decision-Makers Tend To Seek Power »
Alex Turner · Prasad Tadepalli -
2021 Poster: One Explanation is Not Enough: Structured Attention Graphs for Image Classification »
Vivswan Shitole · Fuxin Li · Minsuk Kahng · Prasad Tadepalli · Alan Fern -
2021 Poster: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2020 Poster: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2020 Spotlight: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2013 Poster: Symbolic Opportunistic Policy Iteration for Factored-Action MDPs »
Aswin Raghavan · Roni Khardon · Alan Fern · Prasad Tadepalli -
2012 Poster: A Bayesian Approach for Policy Learning from Trajectory Preference Queries »
Aaron Wilson · Alan Fern · Prasad Tadepalli -
2011 Poster: Autonomous Learning of Action Models for Planning »
Neville Mehta · Prasad Tadepalli · Alan Fern -
2011 Poster: Inverting Grice's Maxims to Learn Rules from Natural Language Extractions »
M. Shahed Sorower · Thomas Dietterich · Janardhan Rao Doppa · Walker Orr · Prasad Tadepalli · Xiaoli Fern -
2010 Poster: Batch Bayesian Optimization via Simulation Matching »
Javad Azimi · Alan Fern · Xiaoli Fern