Skip to yearly menu bar Skip to main content

Workshop: Intrinsically Motivated Open-ended Learning (IMOL) Workshop

Generative Intrinsic Optimization: Intrisic Control with Model Learning

Jianfei Ma

Keywords: [ Model Learning ] [ intrinsic motivation ] [ Mutual Information ]


Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.

Chat is not available.