in

Workshop: NeurIPS 2022 Workshop on Score-Based Methods

Abstract:
Model-based reinforcement learning (MBRL) exhibits favorable performance in practice, but its theoretical guarantees are mostly restricted to the setting when the transition model is Gaussian or Lipschitz and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a \emph{Bayesian coreset} of only statistically significant past state-action pairs; and (iii) {exhibits a Bayesian regret of $\mathcal{O}(dH^{1+({\alpha}/{2})}T^{1-({\alpha}/{2})})$ with coreset size of $\Omega(\sqrt{T^{1+\alpha}})$, where $d$ is the aggregate dimension of state action space, $H$ is the episode length, $T$ is the total number of time steps experienced, and $\alpha\in (0,1]$ is the tuning parameter which is a novel introduction into the analysis of MBRL in this work}. To achieve these results, we adopt an approach based upon Stein's method, which allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up to $50\%$ reduction in wall clock time in some continuous control environments.

Chat is not available.