Timezone: »

Regret based Robust Solutions for Uncertain Markov Decision Processes
Asrar Ahmed · Pradeep Varakantham · Yossiri Adulyasak · Patrick Jaillet

Fri Dec 06 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor #None

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of {\em maximin} policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed {\em minimax} regret as a suitable alternative to the {\em maximin} objective for robust optimization. However, existing algorithms for handling {\em minimax} regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.

Author Information

Asrar Ahmed (Singapore Management University)
Pradeep Varakantham (Singapore Management University)
Yossiri Adulyasak (University of Montréal (HEC))
Patrick Jaillet (MIT)

More from the Same Authors