Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning Workshop

Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization

Nasik Muhammad Nafi · Raja Farrukh Ali · William Hsu


Existence of policy-value representation asymmetry negatively affects the generalization capability of the traditional actor-critic architecture that uses a shared representation of policy and value. Fully separated networks for policy and value avoid overfitting by addressing this representation asymmetry. However, two separate networks introduce high computational overhead. Previous work has also shown that partial separation can achieve the same level of generalization in most tasks while reducing this computational overhead. Thus, the questions arise Do we really need two separate networks? Is there any particular scenario where only full separation works? In this work, we attempt to analyze the generalization performance compared to the extent of decoupling. We compare four different degrees of subnetwork separation, namely: fully shared; early separated, lately separated, and fully separated on the RL generalization benchmark Procgen, a suite of 16 procedurally-generated environments. We show that unless the environment has a distinct or explicit source of value estimation, partial separation can easily capture the necessary policy-value representation asymmetry and achieve better generalization performance in unseen scenarios.

Chat is not available.