`

Timezone: »

 
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
Daniel Bear · Elias Wang · Damian Mrowca · Felix Binder · Hsiao-Yu Tung · Pramod RT · Cameron Holdaway · Sirui Tao · Kevin Smith · Fan-Yun Sun · Fei-Fei Li · Nancy Kanwisher · Josh Tenenbaum · Dan Yamins · Judith Fan
Event URL: https://openreview.net/forum?id=CXyZrKPz4CU »

While current vision algorithms excel at many challenging tasks, it is unclear how well they understand the physical dynamics of real-world environments. Here we introduce Physion, a dataset and benchmark for rigorously evaluating the ability to predict how physical scenarios will evolve over time. Our dataset features realistic simulations of a wide range of physical phenomena, including rigid and soft- body collisions, stable multi-object configurations, rolling, sliding, and projectile motion, thus providing a more comprehensive challenge than previous bench- marks. We used Physion to benchmark a suite of models varying in their architecture, learning objective, input-output structure, and training data. In parallel, we obtained precise measurements of human prediction behavior on the same set of scenarios, allowing us to directly evaluate how well any model could approximate human behavior. We found that vision algorithms that learn object-centric representations generally outperform those that do not, yet still fall far short of human performance. On the other hand, graph neural networks with direct access to physical state information both perform substantially better and make predictions that are more similar to those made by humans. These results suggest that extracting physical representations of scenes is the main bottleneck to achieving human-level and human-like physical understanding in vision algorithms. We have publicly released all data and code to facilitate the use of Physion to benchmark additional models in a fully reproducible manner, enabling systematic evaluation of progress towards vision algorithms that understand physical environments as robustly as people do.

Author Information

Daniel Bear (Stanford University)
Elias Wang (Stanford University)
Damian Mrowca (Stanford University)

Young children are excellent at playing, an ability to explore and (re)structure their environment that allows them to develop a remarkable visual and physical representation of their world that sets them apart from even the most advanced robots. Damian Mrowca is studying (1) representations and architectures that allow machines to efficiently develop an intuitive physical understanding of their world and (2) mechanisms that allow agents to learn such representations in a self-supervised way. Damian is a 3rd year PhD student co-advised by Prof. Fei-Fei Li and Prof. Daniel Yamins. He received his BSc (2012) and MSc (2015) in Electrical Engineering and Information Theory, both from the Technical University of Munich. During 2014-2015 he was a visiting student with Prof. Trevor Darrell at UC Berkeley. After a year in start-up land, looking to apply his research in businesses, he joined the Stanford Vision Lab and NeuroAILab in September 2016.

Felix Binder (UCSD)

I’m a third year PhD student at the cognitive science department at UC San Diego. I’m working on visual and physical reasoning and mental simulation. Currently, I am working on agent-based reinforcement learning models for physical construction tasks (ie. building structures using bricks), with a focus on how we might plan more efficiently making use of the environment. My approach is best described as computational cognitive science: trying to discover the high-level algorithms of cognition.

Hsiao-Yu Tung (Carnegie Mellon University)
Pramod RT (MIT)
Cameron Holdaway
Sirui Tao (University of California, San Diego)
Kevin Smith (MIT)
Fan-Yun Sun (Stanford University)
Fei-Fei Li (Princeton University)
Nancy Kanwisher (MIT)
Josh Tenenbaum (MIT)

Josh Tenenbaum is an Associate Professor of Computational Cognitive Science at MIT in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 1999, and was an Assistant Professor at Stanford University from 1999 to 2002. He studies learning and inference in humans and machines, with the twin goals of understanding human intelligence in computational terms and bringing computers closer to human capacities. He focuses on problems of inductive generalization from limited data -- learning concepts and word meanings, inferring causal relations or goals -- and learning abstract knowledge that supports these inductive leaps in the form of probabilistic generative models or 'intuitive theories'. He has also developed several novel machine learning methods inspired by human learning and perception, most notably Isomap, an approach to unsupervised learning of nonlinear manifolds in high-dimensional data. He has been Associate Editor for the journal Cognitive Science, has been active on program committees for the CogSci and NIPS conferences, and has co-organized a number of workshops, tutorials and summer schools in human and machine learning. Several of his papers have received outstanding paper awards or best student paper awards at the IEEE Computer Vision and Pattern Recognition (CVPR), NIPS, and Cognitive Science conferences. He is the recipient of the New Investigator Award from the Society for Mathematical Psychology (2005), the Early Investigator Award from the Society of Experimental Psychologists (2007), and the Distinguished Scientific Award for Early Career Contribution to Psychology (in the area of cognition and human learning) from the American Psychological Association (2008).

Dan Yamins
Judith Fan (University of California, San Diego)

More from the Same Authors