Timezone: »

Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter

In real world, affecting the environment by a weak policy can be expensive or very risky, therefore hampers real world applications of reinforcement learning. Offline Reinforcement Learning (RL) can learn policies from a given dataset without interacting with the environment. However, the dataset is the only source of information for an Offline RL algorithm and determines the performance of the learned policy. We still lack studies on how dataset characteristics influence different Offline RL algorithms. Therefore, we conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments. A dataset is characterized by two metrics: (1) the Trajectory Quality (TQ) measured by the average dataset return and (2) the State-Action Coverage (SACo) measured by the number of unique state-action pairs. We found that variants of the off-policy Deep Q-Network family require datasets with high SACo to perform well. Algorithms that constrain the learned policy towards the given dataset perform well for datasets with high TQ or SACo. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.

Author Information

Kajetan Schweighofer (Johannes Kepler University Linz)
Markus Hofmarcher (ELLIS Unit / University Linz)
Marius-Constantin Dinu (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Dynatrace Research)
Philipp Renz (LIT AI Lab - JKU Linz)
Angela Bitto (JKU)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Sepp Hochreiter (LIT AI Lab / University Linz / IARAI)

More from the Same Authors