We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function.
Takayuki Osogami (IBM Research - Tokyo)
More from the Same Authors
2014 Poster: Restricted Boltzmann machines modeling human choice »
Takayuki Osogami · Makoto Otsuka
2013 Poster: Solving inverse problem of Markov chain with partial observations »
Tetsuro Morimura · Takayuki Osogami · Tsuyoshi Ide