Timezone: »

 
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu
Event URL: https://openreview.net/forum?id=CAFK5R65IX »

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By proper pretraining and concurrent cross-task online fine-tuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 73% in some instances.

Author Information

yifan xu (university of california san diego)
Nicklas Hansen (UC San Diego)
Zirui Wang (University of California, San Diego)
Zirui Wang

I am a final-year, final-quarter undergraduate student pursuing a B.S. in Data Science at the Halicioglu Data Science Institute (HDSI) and a B.A. in Cognitive Science at the CogSci Department at the University of California, San Diego (UCSD). My domain focuses on methods & applications in AI/ML in general. I am a recipient of the HDSI Undergraduate Scholarship. During my undergraduate studies & research, I have acquired/am currently acquiring experience in Reinforcement Learning, Hierachical Visual Reasoning, as well as Prompt Tuning. I am fortunate to be advised by Prof. Zhuowen Tu and Prof. Zhiting Hu at UCSD.

Yung-Chieh Chan (University of California, San Diego)
Hao Su (UCSD)
Zhuowen Tu (University of California, San Diego)

More from the Same Authors