University of Texas, Austin; The University of Texas; University of Texas, Austin
Workshop: Robust Statistical Learning
7:30am - 6:30pm Friday, December 10, 2010
Hilton: Mt Currie South
At the core of statistical machine learning is to infer conclusions from data, typically using statistical models that describe probabilistic relationships among the underlying variables. Such modeling allows us to make strong predictions even from limited data by leveraging specific problem structure. However on the flip side, when the specific model assumptions do not exactly hold, the resulting methods may deteriorate severely. A simple example: even a few corrupted points, or points with a few corrupted entries, can severely throw off standard SVD-based PCA.
The goal of this workshop is to investigate this ``robust learning'' setting where the data deviate from the model assumptions in a variety of different ways. Depending on what is known about the deviations, we can have a spectrum of approaches:
(a) Dirty Models: Statistical models that impose ``clean'' structural assumptions such as sparsity, low-rank etc. have proven very effective at imposing bias without being overly restrictive. A superposition of two (or more) such clean models can provide a method that is also robust. For example, approximating data by the sum of a sparse matrix and a low-rank one leads to PCA that is robust to corrupted entries.
(b) Robust Optimization: Most statistical learning methods implicitly or explicitly have an underlying optimization problem. Robust optimization uses techniques from convexity and duality, to construct solutions that are immunized from some bounded level of uncertainty, typically expressed as bounded (but otherwise arbitrary, i.e., adversarial) perturbations of the decision parameters.
(c) Classical Robust Statistics; Adversarial Learning: There has been a large body of work on classical robust statistics, which develops estimation methods that are robust to misspecified modeling assumptions in general, and do not model the outliers specifically. While this area is still quite active, it has a long history, with many results developed in the 60s, 70s and 80s. There has also been significant recent work in adversarial machine learning.
Thus, we see that while there has been a resurgence of robust learning methods (broadly understood) in recent years, it seems to be largely coming from different communities that rarely interact: (classical) robust statistics, adversarial machine learning, robust optimization, and multi-structured or dirty model learning. It is the aim of this workshop to bring together researchers from these different communities, and identify common intuitions underlying such robust learning methods. Indeed, with increasingly high-dimensional and ``dirty'' real world data that do not conform to clean modeling assumptions, this is a vital necessity.