Skip to yearly menu bar Skip to main content


Qualcomm AI Research

Expo Talk Panel

Recent developments in embodied AI

Roland Memisevic

Exhibit Hall F
[ ]
Tue 2 Dec 4 p.m. PST — 5 p.m. PST

Abstract:

Embodied AI is the study of systems that can perceive and interact with the physical world in real time. Real-world interactions pose unique challenges for AI systems since they naturally require a deep understanding of the physical world and/or its inhabitants. This understanding is often taken for granted in humans, where it is typically labelled as “intuitive physics” or “common sense”. It is widely agreed that solving this challenge would be as rewarding as it is hard, since it would be equivalent to creating truly capable “world models”, with countless applications in robotics, human-computer interaction, and even in advancing language modeling through concept grounding. Like other areas in AI, embodied AI has seen dramatic advances in recent years, fueled by the success of using pre-trained large language models as a central ingredient to allow for end-to-end training. While this development stands as one of many examples of the power of pre-trained language models, recently the converse has come true as well: embodied AI is increasingly being drawn on to understand real-world common sense and concept grounding in language models themselves, bringing back its early vision as a way to understand human-like cognition and world models.

This talk will provide an in-depth discussion of embodied AI, with a focus on recent advances based on multi-modal large language models. It will discuss how end-to-end training has made it possible to instill key aspects of real-world common sense in a model and how this had enabled highly ambitious use-cases, such as generalist (“common sense”) robot control and real-world visual interaction (“chatbots that can see and hear you”). The talk will also discuss practical considerations, such as streaming inference at the edge, end-to-end training data generation and the role of reinforcement learning, as well as open challenges in state tracking and long-term memory.

Live content is unavailable. Log in and register to view live content