Timezone: »

 
Tutorial
Vision-Based Control, Control-Based Vision, and the Information Knot That Ties Them
Stefano Soatto

Mon Dec 06 03:30 PM -- 05:30 PM (PST) @ Regency D

The purpose of this tutorial is to explore the interplay between sensing and control, to highlight the "information knot" that ties them, and to design inference and learning algorithms to compute "representations" from data that are optimal, by design, for decision and control tasks. We will focus on visual sensing, but the analysis developed extends to other modalities.

We will first review various notions of information proposed in different fields from economic theory to perception psychology, and adapt them to decision and control tasks, as opposed to transmission and storage of data. We will see that for complex sensing phenomena, such as vision, nuisance factors play an important role, especially those that are not "invertible" such as occlusions of line-of-sight and quantization-scale. Handling of the nuisances brings forward a notion of "representation," whose complexity measures the amount of "actionable information" contained in the data. We will discuss how to build representations that are optimal by design, in the sense of retaining all and only the statistics that matter to the task. For "invertible" nuisances, such representations can be made lossless (not in the classical sense of distortion, but in the sense of optimal performance in a decision or control task). In some cases, these representations are supported on a thin-set, which can help elucidate the "signal-to-symbol barrier" problem, and relate to a topology-based notion of "sparsity". However, non-invertible nuisances spoil the picture, requiring the introduction of a notion of "stability" of the representation with respect to non-invertible nuisances. This is not the classical notion of (bounded-input-bounded-output) stability from control theory, but instead relates to "structural stability" from catastrophe theory. The design of maximally stable statistics brings forward a notion of "proper sampling" of the data. However, this is not the traditional notion of proper sampling from Nyquist, but one related to persistent topology. Once an optimal representation is constructed, a bound on the risk or control functional can be derived, analog to distortion in communications. The "currency" that trades off this error (the equivalent of the bit-rate in communication) is not the amount of data, but instead the "control authority" over the sensing process. Thus, sensing and control are intimately tied: Actionable information drives the control process, and control of the sensing process is what allows computing a representation.

We will present case studies in which formulating visual decision problems (e.g. detection, localization, recognition, categorization) in the context of vision-based control leads to improved performance and reduced computational burden. They include established low-level vision tools (e.g. tracking, local invariant descriptors), robotic exploration, and action and activity recognition. We will describe some of these in detail and distribute source code at the workshop, together with course notes.

Author Information

Stefano Soatto (UCLA)

Stefano Soatto received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical Engineering and Biomedical Engineering at Washington University, and Research Associate in Applied Sciences at Harvard University. Between 1995 and 1998 he was also Ricercatore in the Department of Mathematics and Computer Science at the University of Udine - Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. His general research interests are in Computer Vision and Nonlinear Estimation and Control Theory. In particular, he is interested in ways for computers to use sensory information to interact with humans and the environment. Dr. Soatto is the recipient of the David Marr Prize for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion. He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is a Member of the Editorial Board of the International Journal of Computer Vision (IJCV) and Foundations and Trends in Computer Graphics and Vision. He is the founder and director of the UCLA Vision Lab; more information is available at http://vision.ucla.edu

More from the Same Authors