Poster
Worst-Case Offline Reinforcement Learning with Arbitrary Data Support
Kohei Miyaguchi
West Ballroom A-D #6307
[
Abstract
]
Wed 11 Dec 11 a.m. PST
— 2 p.m. PST
Abstract:
We propose a method of the offline reinforcement learning (RL)featuring the performance guarantee *without* any conditions on the data support.Without any conditions on the data support,the offline RL becomes an ill-posed problem;it is in general impossible to estimate or optimize for the conventional performance metric.To address this issue, we employ the *worst-case policy value*as a new metricandconstructively show that the sample complexity bound of $O(\epsilon^{-2}(1-\gamma)^{-4}\ln (1/\delta))$ is attainable without any data-support conditions,where $\epsilon$ is the suboptimality in the new metric and $\delta$ is the confidence level.Moreover, since the new metric is a generalization of the conventional one,the resulting algorithm can be used to solve the conventional offline RL problem as it is.In this context, our sample complexity bound can be also seen as a strict improvement on the previous boundsunder the single-policy concentrability and the realizability.
Live content is unavailable. Log in and register to view live content