Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning Workshop

Efficient Multi-Horizon Learning for Off-Policy Reinforcement Learning

Raja Farrukh Ali · Nasik Muhammad Nafi · Kevin Duong · William Hsu


Value estimates at multiple timescales can help create advanced discounting functions and allow agents to form more effective predictive models of their environment. In this work, we investigate learning over multiple horizons concurrently for off-policy deep reinforcement learning using an efficient architecture that combines a deeper network with the crucial components of Rainbow, a popular value-based off-policy algorithm. We use an advantage-based action selection method and our proposed agent learns over multiple horizons simultaneously while using either an exponential or hyperbolic discounting function to estimate the advantage that constitutes the acting policy. We test our approach on the Procgen benchmark, a collection of procedurally-generated environments, to demonstrate the effectiveness of this approach, specifically to evaluate the agent's performance in previously unseen scenarios.

Chat is not available.