Timezone: »

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
Abhishek Gupta · Aldo Pacchiano · Yuexiang Zhai · Sham Kakade · Sergey Levine

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #303

The success of reinforcement learning in a variety of challenging sequential decision-making problems has been much discussed, but often ignored in this discussion is the consideration of how the choice of reward function affects the behavior of these algorithms. Most practical RL algorithms require copious amounts of reward engineering in order to successfully solve challenging tasks. The idea of this type of ``reward-shaping'' has been often discussed in the literature and is used in practical instantiations, but there is relatively little formal characterization of how the choice of reward shaping can yield benefits in sample complexity for RL problems. In this work, we build on the framework of novelty-based exploration to provide a simple scheme for incorporating shaped rewards into RL along with an analysis tool to show that particular choices of reward shaping provably improve sample efficiency. We characterize the class of problems where these gains are expected to be significant and show how this can be connected to practical algorithms in the literature. We show that these results hold in practice in experimental evaluations as well, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance.

Author Information

Abhishek Gupta (University of Washington)
Aldo Pacchiano (Microsoft Research)
Yuexiang Zhai (University of California Berkeley)
Sham Kakade (Harvard University & Amazon)
Sergey Levine (UC Berkeley)

More from the Same Authors