Skip to yearly menu bar Skip to main content


Task-aware world model learning with meta weighting via bi-level optimization

Huining Yuan · Hongkun Dou · Xingyu Jiang · Yue Deng

Great Hall & Hall B1+B2 (level 1) #1304
[ ] [ Project Page ]
[ Paper [ Poster [ OpenReview
Wed 13 Dec 8:45 a.m. PST — 10:45 a.m. PST


Aligning the world model with the environment for the agent’s specific task is crucial in model-based reinforcement learning. While value-equivalent models may achieve better task awareness than maximum-likelihood models, they sacrifice a large amount of semantic information and face implementation issues. To combine the benefits of both types of models, we propose Task-aware Environment Modeling Pipeline with bi-level Optimization (TEMPO), a bi-level model learning framework that introduces an additional level of optimization on top of a maximum-likelihood model by incorporating a meta weighter network that weights each training sample. The meta weighter in the upper level learns to generate novel sample weights by minimizing a proposed task-aware model loss. The model in the lower level focuses on important samples while maintaining rich semantic information in state representations. We evaluate TEMPO on a variety of continuous and discrete control tasks from the DeepMind Control Suite and Atari video games. Our results demonstrate that TEMPO achieves state-of-the-art performance regarding asymptotic performance, training stability, and convergence speed.

Chat is not available.