Deep learning has been at the root of significant progress in many application areas, such as computer perception and natural language processing. But almost all of these systems currently use supervised learning with human-curated labels. The challenge of the next several years is to let machines learn from raw, unlabeled data, such as images, videos and text. Intelligent systems today do not possess "common sense", which humans and animals acquire by observing the world, acting in it, and understanding the physical constraints of it. I will argue that allowing machine to learn predictive models of the world is key to significant progress in artificial intelligence, and a necessary component of model-based planning and reinforcement learning. The main technical difficulty is that the world is only partially predictable. A general formulation of unsupervised learning that deals with partial predictability will be presented. The formulation connects many well-known approaches to unsupervised learning, as well as new and exciting ones such as adversarial training.