Timezone: »

Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning
Braham Snyder · Yuke Zhu

The residual gradient algorithm (RG), gradient descent of the Mean Squared Bellman Error, brings robust convergence guarantees to bootstrapped value estimation. Meanwhile, the far more common semi-gradient algorithm (SG) suffers from well-known instabilities and divergence. Unfortunately, RG often converges slowly in practice. Baird (1995) proposed residual algorithms (RA), weighted averaging of RG and SG, to combine RG's robust convergence and SG's speed. RA works moderately well in the online setting. We find, however, that RA works disproportionately well in the offline setting. Concretely, we find that merely adding a variable residual component to SAC increases its score on D4RL gym tasks by a median factor of 54. We further show that using the minimum of ten critics lets our algorithm match SAC-$N$'s state-of-the-art returns using 50$\times$ less compute and no additional hyperparameters. In contrast, TD3+BC with the same minimum-of-ten-critics trick does not match SAC-$N$'s returns on a handful of environments.