Tutorial
Vision-Based Control, Control-Based Vision, and the Information Knot That Ties Them
Stefano Soatto
Regency D
The purpose of this tutorial is to explore the interplay between sensing and control, to highlight the "information knot" that ties them, and to design inference and learning algorithms to compute "representations" from data that are optimal, by design, for decision and control tasks. We will focus on visual sensing, but the analysis developed extends to other modalities.
We will first review various notions of information proposed in different fields from economic theory to perception psychology, and adapt them to decision and control tasks, as opposed to transmission and storage of data. We will see that for complex sensing phenomena, such as vision, nuisance factors play an important role, especially those that are not "invertible" such as occlusions of line-of-sight and quantization-scale. Handling of the nuisances brings forward a notion of "representation," whose complexity measures the amount of "actionable information" contained in the data. We will discuss how to build representations that are optimal by design, in the sense of retaining all and only the statistics that matter to the task. For "invertible" nuisances, such representations can be made lossless (not in the classical sense of distortion, but in the sense of optimal performance in a decision or control task). In some cases, these representations are supported on a thin-set, which can help elucidate the "signal-to-symbol barrier" problem, and relate to a topology-based notion of "sparsity". However, non-invertible nuisances spoil the picture, requiring the introduction of a notion of "stability" of the representation with respect to non-invertible nuisances. This is not the classical notion of (bounded-input-bounded-output) stability from control theory, but instead relates to "structural stability" from catastrophe theory. The design of maximally stable statistics brings forward a notion of "proper sampling" of the data. However, this is not the traditional notion of proper sampling from Nyquist, but one related to persistent topology. Once an optimal representation is constructed, a bound on the risk or control functional can be derived, analog to distortion in communications. The "currency" that trades off this error (the equivalent of the bit-rate in communication) is not the amount of data, but instead the "control authority" over the sensing process. Thus, sensing and control are intimately tied: Actionable information drives the control process, and control of the sensing process is what allows computing a representation.
We will present case studies in which formulating visual decision problems (e.g. detection, localization, recognition, categorization) in the context of vision-based control leads to improved performance and reduced computational burden. They include established low-level vision tools (e.g. tracking, local invariant descriptors), robotic exploration, and action and activity recognition. We will describe some of these in detail and distribute source code at the workshop, together with course notes.