NeurIPS Poster Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Poster

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Falcon Dai · Matthew Walter

East Exhibition Hall B + C #202

Keywords: [ Markov Decision Processes; Reinforce ] [ Reinforcement Learning and Planning -> Exploration; Reinforcement Learning and Planning ] [ Reinforcement Learning and Planning ]

[ Abstract ]

Abstract:

We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.

Live content is unavailable. Log in and register to view live content