Timezone: »

Understanding the Effects of Dataset Composition on Offline Reinforcement Learning
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Angela Bitto · Philipp Renz · Vihang Patil · Sepp Hochreiter

The promise of Offline Reinforcement Learning (RL) lies in learning policies from fixed datasets, without interacting with the environment. Being unable to interact makes the dataset the most essential ingredient of the algorithm, as it directly affects the learned policies. Studies on how the dataset composition influences various Offline RL algorithms are missing. Towards that end, we conducted a comprehensive empirical analysis on the effect of dataset composition towards the performance of Offline RL algorithms for discrete action environments. The performance is studied through two metrics of the datasets, Trajectory Quality (TQ) and State-Action Coverage (SACo). Our analysis suggests that variants of the off-policy Deep-Q-Network family rely on the dataset to exhibit high SACo. Contrary to that, algorithms that constrain the learned policy towards the data generating policy perform well across datasets, if they exhibit high TQ or SACo or both. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.

Author Information

Kajetan Schweighofer (Johannes Kepler University Linz)
Markus Hofmarcher (ELLIS Unit / University Linz)
Marius-Constantin Dinu (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Dynatrace Research)
Angela Bitto (JKU)
Philipp Renz (LIT AI Lab - JKU Linz)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Sepp Hochreiter (LIT AI Lab / University Linz / IARAI)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors