Skip to yearly menu bar Skip to main content


Poster

Explicit Planning for Efficient Exploration in Reinforcement Learning

Liangpeng Zhang · Ke Tang · Xin Yao

East Exhibition Hall B + C #181

Keywords: [ Model-Based RL; Reinfo ] [ Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning ] [ Reinforcement Learning and Planning ] [ Exploration ]


Abstract:

Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(n^2 md) or O(n^2 m + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration.

Live content is unavailable. Log in and register to view live content