Timezone: »
Many modern methods for prediction leverage nearest neighbor (NN) search to find past training examples most similar to a test example, an idea that dates back in text to at least the 11th century in the “Book of Optics” by Alhazen. Today, NN methods remain popular, often as a cog in a bigger prediction machine, used for instance in recommendation systems, forecasting baseball player performance and election outcomes, survival analysis in healthcare, image inpainting, crowdsourcing, graphon estimation, and more. The popularity of NN methods is due in no small part to the proliferation of highquality fast approximate NN search methods that scale to highdimensional massive datasets typical of contemporary applications. Moreover, NN prediction readily pairs with methods that learn similarities, such as metric learning methods or Siamese networks. In fact, some wellknown pairings that result in nearest neighbor predictors that learn similarities include random forests and many boosting methods.
Despite the popularity, success, and age of nearest neighbor methods, our theoretical understanding of them is still surprisingly incomplete (perhaps much to the chagrin of the initial efforts of analysis by Fix, Hodges, Cover, and Hart) and can also be disconnected from what practitioners actually want or care about. Many successful approximate nearest neighbor methods in practice do not have known theoretical guarantees, and many of the guarantees for exact nearest neighbor methods do not readily handle approximation. Meanwhile, many applications use variations on NN methods, for which existing theory may not extend to, or for which existing theory is not easily usable by a practitioner. Suffice it to say, a lot is lost in translation between different communities working with NN methods.
In short, NN methods is an exciting field at the intersection of classical statistics, machine learning, data structures and domain specific expertise. The aim of this work is to bring together theoreticians and practitioners alike from these various different backgrounds with a diverse range of perspectives to bring everyone up to speed on:
 Best known statistical/computational guarantees (especially recent nonasymptotic results)
 Latest methods/systems that have been developed especially for fast approximate NN search that scale to massive datasets
 Various applications in which NN methods are heavily used as a critical component in prediction or inference
By gathering a diverse crowd, we hope attendees share their perspectives, identify ways to bridge theory and practice, and discuss avenues of future research.
Fri 9:00 a.m.  9:20 a.m.

A millennium of nearest neighbor methods – an introduction to the NIPS nearest neighbor workshop 2017
(Talk)
»
I give a brief history of nearest neighbor (NN) methods, starting from Alhazen’s “Book of Optics” in the 11th century, and leading up to present time. Surprisingly, the most general nonasymptotic results on NN prediction only recently emerged in 2014 in the work of Chaudhuri and Dasgupta. Turning toward “userfriendly” theory with an eye toward practitioners, I mention recent guarantees (20132015) in the contemporary applications of time series forecasting, online collaborative filtering, and medical image segmentation — in all three cases, clustering structure enables successful NN prediction. 
George H Chen 
Fri 9:20 a.m.  10:05 a.m.

Analyzing Robustness of Nearest Neighbors to Adversarial Examples
(Talk)
»
Motivated by applications such as autonomous vehicles, testtime attacks via adversarial examples have received a great deal of recent attention. In this setting, an adversary is capable of making queries to a classifier, and perturbs a test example by a small amount in order to force the classifier to report an incorrect label. While a long line of work has explored a number of attacks, not many reliable defenses are known, and there is an overall lack of general understanding about the foundations of designing machine learning algorithms robust to adversarial examples. In this talk, we take a step towards addressing this challenging question by introducing a new theoretical framework, analogous to biasvariance theory, which we can use to tease out the causes of vulnerability. We apply our framework to a simple classification algorithm: nearest neighbors, and analyze its robustness to adversarial examples. Motivated by our analysis, we propose a modified version of the nearest neighbor algorithm, and demonstrate both theoretically and empirically that it has superior robustness to standard nearest neighbors. 
Kamalika Chaudhuri 
Fri 10:05 a.m.  10:25 a.m.

New perspective from Blackwell's "comparisons of experiments" on generative adversarial networks
(Talk)
»
We bring the tools from Blackwell's seminal result on comparing two stochastic experiments, to shine new lights on the modern applications of great interest: generative adversarial networks (GAN). Binary hypothesis testing is at the center of GANs, and we propose new data processing inequalities that allows us to discover new algorithms to combat mode collapse, provide sharper analyses, and provide simpler proofs. This leads to a new architecture to handle one of the major challenges in GAN known as ``mode collapse''; the lack of diversity in the samples generated by the learned generators. The hypothesis testing view of GAN allows us to analyze the new architecture and show that it encourages generators with no mode collapse. Experimental results show that the proposed architecture can learn to generate samples with diversity that is orders of magnitude better than competing approaches, while being simpler. For this talk, I will assume no prior background on GANs. 
Sewoong Oh 
Fri 11:00 a.m.  11:45 a.m.

Datadependent methods for similarity search in high dimensions
(Talk)
»
I will give an overview of recent research on designing provable datadependent approximate similarity search methods for highdimensional data. Unlike many approaches proposed in the algorithms literature, the new methods lead to data structures that adapt to the data while retaining correctness and running time guarantees. The highlevel principle behind those methods is that every data set (even a completely random one) has some structure that can be exploited algorithmically. This interpolates between basic "dataoblivious" approaches (e.g., those based on simple random projections) that are analyzable but do not adapt easily to the structure of the data, and basic "datadependent" methods that exploit specific structure of the data, but might not perform well if this structure is not present. Concretely, I will cover two recent lines of research:  Datadependent LocalitySensitive Hashing: approximate nearest neighbor search methods that provably improve over the best possible dataoblivious LSH algorithms (covers work from SODA'14, STOC'15 and NIPS'15).  Datadependent metric compression: algorithms that represent sets of points using a small number of bits while approximately preserving the distances up to higher accuracy than previously known (covers work from SODA'17 and NIPS'17). 
Piotr Indyk 
Fri 11:45 a.m.  12:05 p.m.

Fast kNearest Neighbor Search via Prioritized DCI
(Talk)
»
Most exact methods for knearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) [19] offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to LocalitySensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21. 
Ke Li 
Fri 1:45 p.m.  2:30 p.m.

How to stop worrying and learn to love Nearest Neighbors
(Talk)
»
Nearest neighbors is an algorithm everyone loves to hate. It's too trivial, too bruteforce, doesn't offer any insights. In this talk, I will try to make you fall in love with the humble nearest neighbor. First, I will discuss the use of nearest neighbor as an exceptionally useful baseline to guard against our field's natural bias in favor of elegant algorithms over data. Then, I will discuss some scenarios when nearest neighbors is particularly useful in practice. 
Alexei Efros 
Fri 2:30 p.m.  2:50 p.m.

A Platform for Data Science
(Talk)
»
We describe an extensible data science platform based on nonparametric statistical methods such as nearest neighbors. This platform was borne out of the need for us to provide scalable and flexible predictive analytics solutions for retailers and the federal government. In this talk, I will describe the formalism associated with the platform, discuss its performance on benchmark datasets outofthebox and show a live demo for building a recommendation system and for detecting geospatial anomalies in maritime domain. 
Vishal Doshi 
Fri 2:50 p.m.  4:15 p.m.

Poster Session (encompasses coffee break)
(Poster Session)
»

Beidi Chen, Borja Balle, Daniel Lee, iuri frosio, Jitendra Malik, Jan Kautz, Ke Li, Masashi Sugiyama, Miguel A. CarreiraPerpinan, Ramin Raziperchikolaei, Theja Tulabandhula, YungKyun Noh, Adams Wei Yu

Fri 4:15 p.m.  5:00 p.m.

The Boundary Forest Algorithm
(Talk)
»
I explain the Boundary Forest algorithm, a simple, fast, and incremental online algorithm that can be used in both supervised and unsupervised settings, for classification, regression, or retrieval problems. As a classification algorithm, its generalization performance is at least as good as Knearestneighbors, while being able to respond to queries very quickly. It maintains trees of examples that let it train and respond to test queries in a time logarithmic with the number of stored examples, which is typically very much less than the number of training examples. 
Jonathan Yedidia 
Fri 5:00 p.m.  5:20 p.m.

Iterative Collaborative Filtering for Sparse Matrix Estimation
(Talk)
»
The sparse matrix estimation problem consists of estimating the distribution of an nbyn matrix Y , from a sparsely observed single instance of this matrix where the entries of Y are independent random variables. This captures a wide array of problems; special instances include matrix completion in the context of recommendation systems, graphon estimation, and community detection in (mixed membership) stochastic block models. Inspired by classical collaborative filtering for recommendation systems, we propose a novel iterative, collaborative filtering style algorithm for matrix estimation in this generic setting. Under model assumptions of uniform sampling, bounded entries, and finite spectrum, we provide bounds on the the mean squared error (MSE) of our estimator and show improved sample complexity. 
Christina Lee 
Author Information
George H Chen (Carnegie Mellon University)
George Chen is an assistant professor of information systems at Carnegie Mellon University. He works on nonparametric prediction methods, applied to healthcare and sustainable development. He received his PhD from MIT in Electrical Engineering and Computer Science.
Devavrat Shah (Massachusetts Institute of Technology)
Devavrat Shah is a professor of Electrical Engineering & Computer Science and Director of Statistics and Data Science at MIT. He received PhD in Computer Science from Stanford. He received Erlang Prize from Applied Probability Society of INFORMS in 2010 and NeuIPS best paper award in 2008.
Christina Lee (Microsoft Research)
More from the Same Authors

2020 Poster: Estimation of Skill Distribution from a Tournament »
Ali Jadbabaie · Anuran Makur · Devavrat Shah 
2020 Spotlight: Estimation of Skill Distribution from a Tournament »
Ali Jadbabaie · Anuran Makur · Devavrat Shah 
2020 Poster: Sample Efficient Reinforcement Learning via LowRank Matrix Estimation »
Devavrat Shah · Dogyoon Song · Zhi Xu · Yuzhe Yang 
2020 Demonstration: tspDB: Time Series Predict DB »
Anish Agarwal · Abdullah Alomar · Devavrat Shah 
2019 Poster: On Robustness of Principal Component Regression »
Anish Agarwal · Devavrat Shah · Dennis Shen · Dogyoon Song 
2019 Oral: On Robustness of Principal Component Regression »
Anish Agarwal · Devavrat Shah · Dennis Shen · Dogyoon Song 
2019 Poster: Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption »
Wei Ma · George H Chen 
2019 Tutorial: Synthetic Control »
Alberto Abadie · Vishal Misra · Devavrat Shah 
2018 Poster: Qlearning with Nearest Neighbors »
Devavrat Shah · Qiaomin Xie 
2017 Poster: Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation »
Christian Borgs · Jennifer Chayes · Christina Lee · Devavrat Shah 
2016 Poster: Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering »
Dogyoon Song · Christina Lee · Yihua Li · Devavrat Shah 
2014 Workshop: Analysis of Rank Data: Confluence of Social Choice, Operations Research, and Machine Learning »
Shivani Agarwal · Hossein Azari Soufiani · Guy Bresler · Sewoong Oh · David Parkes · Arun Rajkumar · Devavrat Shah 
2014 Poster: Hardness of parameter estimation in graphical models »
Guy Bresler · David Gamarnik · Devavrat Shah 
2014 Poster: A Latent Source Model for Online Collaborative Filtering »
Guy Bresler · George H Chen · Devavrat Shah 
2014 Spotlight: A Latent Source Model for Online Collaborative Filtering »
Guy Bresler · George H Chen · Devavrat Shah 
2014 Poster: Learning Mixed Multinomial Logit Model from Ordinal Data »
Sewoong Oh · Devavrat Shah 
2014 Poster: Structure learning of antiferromagnetic Ising models »
Guy Bresler · David Gamarnik · Devavrat Shah 
2013 Workshop: Crowdsourcing: Theory, Algorithms and Applications »
Jennifer Wortman Vaughan · Greg Stoddard · ChienJu Ho · Adish Singla · Michael Bernstein · Devavrat Shah · Arpita Ghosh · Evgeniy Gabrilovich · Denny Zhou · Nikhil Devanur · Xi Chen · Alexander Ihler · Qiang Liu · Genevieve Patterson · Ashwinkumar Badanidiyuru Varadaraja · Hossein Azari Soufiani · Jacob Whitehill 
2013 Poster: A Latent Source Model for Nonparametric Time Series Classification »
George H Chen · Stanislav Nikolov · Devavrat Shah 
2013 Poster: Computing the Stationary Distribution Locally »
Christina Lee · Asuman Ozdaglar · Devavrat Shah 
2012 Poster: Iterative ranking from pairwise comparisons »
Sahand N Negahban · Sewoong Oh · Devavrat Shah 
2012 Spotlight: Iterative ranking from pairwise comparisons »
Sahand N Negahban · Sewoong Oh · Devavrat Shah 
2011 Poster: Iterative Learning for Reliable Crowdsourcing Systems »
David R Karger · Sewoong Oh · Devavrat Shah 
2011 Oral: Iterative Learning for Reliable Crowdsourcing Systems »
David R Karger · Sewoong Oh · Devavrat Shah 
2009 Poster: A DataDriven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah 
2009 Spotlight: A DataDriven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah 
2009 Poster: Local Rules for Global MAP: When Do They Work ? »
Kyomin Jung · Pushmeet Kohli · Devavrat Shah 
2008 Poster: Inferring rankings under constrained sensing »
Srikanth Jagabathula · Devavrat Shah 
2008 Oral: Inferring rankings under constrained sensing »
Srikanth Jagabathula · Devavrat Shah 
2007 Spotlight: Message Passing for Maxweight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky 
2007 Poster: Message Passing for Maxweight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky 
2007 Poster: Local Algorithms for Approximate Inference in MinorExcluded Graphs »
Kyomin Jung · Devavrat Shah