Greg Shakhnarovich, Dhruv Batra, Brian Kulis, Kilian Q Weinberger

TTI-Chicago; TTI-Chicago; UC Berkeley; Washington University

Workshop: Beyond Mahalanobis: Supervised Large-Scale Learning of Similarity

7:30am – 8:00pm Friday, December 16, 2011

Melia Sierra Nevada: Guejar

he notion of similarity (or distance) is central in many problems in machine learning: information retrieval, nearest-neighbor based prediction, visualization of high-dimensional data, etc. Historically, similarity was estimated via a fixed distance function (typically Euclidean), sometimes engineered by hand using domain knowledge. Using statistical learning methods instead to learn similarity functions is appealing, and over the last decade this problem has attracted much attention in the community with several publications in NIPS, ICML, AISTATS, CVPR etc.

Much of this work, however, has focused on a specific, restricted approach: learning a Mahalanobis distance, under a variety of objectives and constraints. This effectively limits the setup to learning a linear embedding of the data.

In this workshop, we will look beyond this setup, and consider methods that learn non-linear embeddings of the data, either explicitly via non-linear mappings or implicitly via kernels. We will especially encourage discussion of methods that are suitable for large-scale problems increasingly facing practitioner of learning methods: large number of examples, high dimensionality of the original space, and/or massively multi-class problems (e.g. Classification with 10,000+ categories, 10,000,000 image of ImageNet dataset).

Our goals are to

1. Create a comprehensive understanding of the state-of-the-art in similarity learning, via presentation of recent work,
2. Initiate an in-depth discussion on major open questions brought up by research in this area. Among these questions:

* Are there gains to be made from introducing non-linearity into similarity models?
* When the underlying task is prediction (classification or regression) are similarity functions worth learning, instead of attacking the prediction task directly? A closely related question - when is it beneficial to use nearest neighbor based methods, with learned similarity?
* What is the right loss (or objective) function to minimize in similarity learning?
* It is often claimed that inherent structure in real data (e.g. low-dimensional manifolds) makes learning easier. How, if at all, does this affect similarity learning?
* What are similarities/distinctions between learning similarity functions and learning hashing?
* What is the relationship between unsupervised similarity learning (often framed as dimensionality reduction) and the supervised similarity learning?
* Are there models of learning nonlinear similarities for which bounds (e.g., generalization error, regret bounds) can be proven?
* What algorithmic techniques must be employed or developed to scale nonlinear similarity learning to extremely large data sets?

We will encourage the invited speakers to address these questions in their talks, and will steer the panel discussion towards some of these.

Target audience of this workshop consists of two (overlapping) groups:
-- practitioners of machine learning who deal with large scale problems where the ability to more accurately predict similarity values is important, and
-- core machine learning researchers working on learning similarity/distance/metric and on similarity-based prediction methods.