Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models

Policy-Guided Diffusion

Matthew T Jackson · Michael Matthews · Cong Lu · Jakob Foerster · Shimon Whiteson

Keywords: [ Synthetic Data ] [ offline reinforcement learning ] [ Reinforcement Learning ] [ diffusion models ]


Abstract:

Model-free methods for offline reinforcement learning typically suffer from value overestimation, resulting from generalization to out-of-sample state-action pairs. On the other hand, model-based methods must handle in compounding errors in transition dynamics, as the policy is rolled out using the learned model. As a solution, we propose policy-guided diffusion (PGD). Our method generates entire trajectories using a diffusion model, with an additional policy guidance term that biases samples towards the policy being trained. Evaluating PGD on the Adroit manipulation environment, we show that guidance dramatically increases trajectory likelihood under the target policy, without increasing model error. When training offline RL agents on purely synthetic data, our early results show that guidance leads to improvements in performance across datasets. We believe this approach is a step towards the training of offline agents on predominantly synthetic experience, minimizing the principal drawbacks of offline RL.

Chat is not available.