Poster
Harnessing Heuristics for Deep Reinforcement Learning via Constrained Optimization
Chi-Chang Lee · Zhang-Wei Hong · Pulkit Agrawal
East Exhibit Hall A-C #4900
In many reinforcement learning (RL) applications, incorporating heuristic signals alongside the exact task objective is crucial for achieving desirable performance. However, heuristics can occasionally lead to biased and suboptimal policies for the exact task objective. Common strategies to enhance performance involve modifying the training objective to ensure that the optimal policies for both the heuristic and the exact task objective remain invariant. Despite this, these strategies often underperform in practical scenarios with finite training data.This paper explores alternatives for improving task performance in finite data settings using heuristic signals. Instead of ensuring optimal policy invariance, we aim to train a policy that surpasses one trained solely with heuristics. We propose a constrained optimization procedure that uses the heuristic policy as a reference, ensuring the learned policy always outperforms the heuristic policy on the exact task objective. Our experiments on robotic locomotion, helicopter, and manipulation tasks demonstrate that this method consistently improves performance, regardless of the general effectiveness of the heuristic signals.
Live content is unavailable. Log in and register to view live content