Skip to yearly menu bar Skip to main content

Workshop: Generalization in Planning (GenPlan '23)

Non-adaptive Online Finetuning for Offline Reinforcement Learning

Audrey Huang · Mohammad Ghavamzadeh · Nan Jiang · Marek Petrik

Keywords: [ Reinforcement Learning ] [ online finetuning ] [ offline RL ]


Offline reinforcement learning (RL) has emerged as an important framework for applying RL to real-life applications. However, the complete lack of online interactions causes technical difficulties, and the online finetuning setting incorporates a limited form of online interactions---which is often available in practice---to address these challenges. Unfortunately, current theoretical frameworks for online finetuning either assume high online sample complexity and/or require deploying fully adaptive algorithms (i.e., unlimited policy changes), which restricts their application to real-world settings where online interactions and policy updates are expensive and limited. In this paper, we develop a new framework for online finetuning. Instead of competing with the optimal policy (which inherits the high sample complexity and adaptivity requirements of online RL), we aim to learn a new policy that improves as much as possible over the existing policy using a pre-specified number of online samples and with a non-adaptive data-collection policy. Our formulation reveals surprising nuances and suggests novel principles that distinguishes the finetuning problem from purely online and offline RL.

Chat is not available.