Timezone: »
Many modern methods for prediction leverage nearest neighbor (NN) search to find past training examples most similar to a test example, an idea that dates back in text to at least the 11th century in the “Book of Optics” by Alhazen. Today, NN methods remain popular, often as a cog in a bigger prediction machine, used for instance in recommendation systems, forecasting baseball player performance and election outcomes, survival analysis in healthcare, image in-painting, crowdsourcing, graphon estimation, and more. The popularity of NN methods is due in no small part to the proliferation of high-quality fast approximate NN search methods that scale to high-dimensional massive datasets typical of contemporary applications. Moreover, NN prediction readily pairs with methods that learn similarities, such as metric learning methods or Siamese networks. In fact, some well-known pairings that result in nearest neighbor predictors that learn similarities include random forests and many boosting methods.
Despite the popularity, success, and age of nearest neighbor methods, our theoretical understanding of them is still surprisingly incomplete (perhaps much to the chagrin of the initial efforts of analysis by Fix, Hodges, Cover, and Hart) and can also be disconnected from what practitioners actually want or care about. Many successful approximate nearest neighbor methods in practice do not have known theoretical guarantees, and many of the guarantees for exact nearest neighbor methods do not readily handle approximation. Meanwhile, many applications use variations on NN methods, for which existing theory may not extend to, or for which existing theory is not easily usable by a practitioner. Suffice it to say, a lot is lost in translation between different communities working with NN methods.
In short, NN methods is an exciting field at the intersection of classical statistics, machine learning, data structures and domain specific expertise. The aim of this work is to bring together theoreticians and practitioners alike from these various different backgrounds with a diverse range of perspectives to bring everyone up to speed on:
- Best known statistical/computational guarantees (especially recent non-asymptotic results)
- Latest methods/systems that have been developed especially for fast approximate NN search that scale to massive datasets
- Various applications in which NN methods are heavily used as a critical component in prediction or inference
By gathering a diverse crowd, we hope attendees share their perspectives, identify ways to bridge theory and practice, and discuss avenues of future research.
Fri 9:00 a.m. - 9:20 a.m.
|
A millennium of nearest neighbor methods – an introduction to the NIPS nearest neighbor workshop 2017
(
Talk
)
I give a brief history of nearest neighbor (NN) methods, starting from Alhazen’s “Book of Optics” in the 11th century, and leading up to present time. Surprisingly, the most general nonasymptotic results on NN prediction only recently emerged in 2014 in the work of Chaudhuri and Dasgupta. Turning toward “user-friendly” theory with an eye toward practitioners, I mention recent guarantees (2013-2015) in the contemporary applications of time series forecasting, online collaborative filtering, and medical image segmentation — in all three cases, clustering structure enables successful NN prediction. |
George H Chen 🔗 |
Fri 9:20 a.m. - 10:05 a.m.
|
Analyzing Robustness of Nearest Neighbors to Adversarial Examples
(
Talk
)
Motivated by applications such as autonomous vehicles, test-time attacks via adversarial examples have received a great deal of recent attention. In this setting, an adversary is capable of making queries to a classifier, and perturbs a test example by a small amount in order to force the classifier to report an incorrect label. While a long line of work has explored a number of attacks, not many reliable defenses are known, and there is an overall lack of general understanding about the foundations of designing machine learning algorithms robust to adversarial examples. In this talk, we take a step towards addressing this challenging question by introducing a new theoretical framework, analogous to bias-variance theory, which we can use to tease out the causes of vulnerability. We apply our framework to a simple classification algorithm: nearest neighbors, and analyze its robustness to adversarial examples. Motivated by our analysis, we propose a modified version of the nearest neighbor algorithm, and demonstrate both theoretically and empirically that it has superior robustness to standard nearest neighbors. |
Kamalika Chaudhuri 🔗 |
Fri 10:05 a.m. - 10:25 a.m.
|
New perspective from Blackwell's "comparisons of experiments" on generative adversarial networks
(
Talk
)
We bring the tools from Blackwell's seminal result on comparing two stochastic experiments, to shine new lights on the modern applications of great interest: generative adversarial networks (GAN). Binary hypothesis testing is at the center of GANs, and we propose new data processing inequalities that allows us to discover new algorithms to combat mode collapse, provide sharper analyses, and provide simpler proofs. This leads to a new architecture to handle one of the major challenges in GAN known as ``mode collapse''; the lack of diversity in the samples generated by the learned generators. The hypothesis testing view of GAN allows us to analyze the new architecture and show that it encourages generators with no mode collapse. Experimental results show that the proposed architecture can learn to generate samples with diversity that is orders of magnitude better than competing approaches, while being simpler. For this talk, I will assume no prior background on GANs. |
Sewoong Oh 🔗 |
Fri 11:00 a.m. - 11:45 a.m.
|
Data-dependent methods for similarity search in high dimensions
(
Talk
)
I will give an overview of recent research on designing provable data-dependent approximate similarity search methods for high-dimensional data. Unlike many approaches proposed in the algorithms literature, the new methods lead to data structures that adapt to the data while retaining correctness and running time guarantees. The high-level principle behind those methods is that every data set (even a completely random one) has some structure that can be exploited algorithmically. This interpolates between basic "data-oblivious" approaches (e.g., those based on simple random projections) that are analyzable but do not adapt easily to the structure of the data, and basic "data-dependent" methods that exploit specific structure of the data, but might not perform well if this structure is not present. Concretely, I will cover two recent lines of research: - Data-dependent Locality-Sensitive Hashing: approximate nearest neighbor search methods that provably improve over the best possible data-oblivious LSH algorithms (covers work from SODA'14, STOC'15 and NIPS'15). - Data-dependent metric compression: algorithms that represent sets of points using a small number of bits while approximately preserving the distances up to higher accuracy than previously known (covers work from SODA'17 and NIPS'17). |
Piotr Indyk 🔗 |
Fri 11:45 a.m. - 12:05 p.m.
|
Fast k-Nearest Neighbor Search via Prioritized DCI
(
Talk
)
Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) [19] offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to Locality-Sensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21. |
Ke Li 🔗 |
Fri 1:45 p.m. - 2:30 p.m.
|
How to stop worrying and learn to love Nearest Neighbors
(
Talk
)
Nearest neighbors is an algorithm everyone loves to hate. It's too trivial, too brute-force, doesn't offer any insights. In this talk, I will try to make you fall in love with the humble nearest neighbor. First, I will discuss the use of nearest neighbor as an exceptionally useful baseline to guard against our field's natural bias in favor of elegant algorithms over data. Then, I will discuss some scenarios when nearest neighbors is particularly useful in practice. |
Alexei Efros 🔗 |
Fri 2:30 p.m. - 2:50 p.m.
|
A Platform for Data Science
(
Talk
)
We describe an extensible data science platform based on non-parametric statistical methods such as nearest neighbors. This platform was borne out of the need for us to provide scalable and flexible predictive analytics solutions for retailers and the federal government. In this talk, I will describe the formalism associated with the platform, discuss its performance on benchmark datasets out-of-the-box and show a live demo for building a recommendation system and for detecting geo-spatial anomalies in maritime domain. |
Vishal Doshi 🔗 |
Fri 2:50 p.m. - 4:15 p.m.
|
Poster Session (encompasses coffee break)
(
Poster Session
)
|
Beidi Chen · Borja Balle · Daniel Lee · iuri frosio · Jitendra Malik · Jan Kautz · Ke Li · Masashi Sugiyama · Miguel A. Carreira-Perpinan · Ramin Raziperchikolaei · Theja Tulabandhula · Yung-Kyun Noh · Adams Wei Yu
|
Fri 4:15 p.m. - 5:00 p.m.
|
The Boundary Forest Algorithm
(
Talk
)
I explain the Boundary Forest algorithm, a simple, fast, and incremental online algorithm that can be used in both supervised and unsupervised settings, for classification, regression, or retrieval problems. As a classification algorithm, its generalization performance is at least as good as K-nearest-neighbors, while being able to respond to queries very quickly. It maintains trees of examples that let it train and respond to test queries in a time logarithmic with the number of stored examples, which is typically very much less than the number of training examples. |
Jonathan Yedidia 🔗 |
Fri 5:00 p.m. - 5:20 p.m.
|
Iterative Collaborative Filtering for Sparse Matrix Estimation
(
Talk
)
The sparse matrix estimation problem consists of estimating the distribution of an n-by-n matrix Y , from a sparsely observed single instance of this matrix where the entries of Y are independent random variables. This captures a wide array of problems; special instances include matrix completion in the context of recommendation systems, graphon estimation, and community detection in (mixed membership) stochastic block models. Inspired by classical collaborative filtering for recommendation systems, we propose a novel iterative, collaborative filtering style algorithm for matrix estimation in this generic setting. Under model assumptions of uniform sampling, bounded entries, and finite spectrum, we provide bounds on the the mean squared error (MSE) of our estimator and show improved sample complexity. |
Christina Lee 🔗 |
Author Information
George H Chen (Carnegie Mellon University)
George Chen is an assistant professor of information systems at Carnegie Mellon University. He works on nonparametric prediction methods, applied to healthcare and sustainable development. He received his PhD from MIT in Electrical Engineering and Computer Science.
Devavrat Shah (Massachusetts Institute of Technology)
Devavrat Shah is a professor of Electrical Engineering & Computer Science and Director of Statistics and Data Science at MIT. He received PhD in Computer Science from Stanford. He received Erlang Prize from Applied Probability Society of INFORMS in 2010 and NeuIPS best paper award in 2008.
Christina Lee (Microsoft Research)
More from the Same Authors
-
2021 Spotlight: Regulating algorithmic filtering on social media »
Sarah Cen · Devavrat Shah -
2021 : Regret, stability, and fairness in matching markets with bandit learners »
Sarah Cen · Devavrat Shah -
2021 : Regret, stability, and fairness in matching markets with bandit learners »
Sarah Cen · Devavrat Shah -
2022 : A Causal Inference Framework for Network Interference with Panel Data »
Sarah Cen · Anish Agarwal · Christina Yu · Devavrat Shah -
2022 : On counterfactual inference with unobserved confounding »
Abhin Shah · Raaz Dwivedi · Devavrat Shah · Gregory Wornell -
2022 Poster: BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs »
Kay Liu · Yingtong Dou · Yue Zhao · Xueying Ding · Xiyang Hu · Ruitong Zhang · Kaize Ding · Canyu Chen · Hao Peng · Kai Shu · Lichao Sun · Jundong Li · George H Chen · Zhihao Jia · Philip S Yu -
2021 Poster: A Computationally Efficient Method for Learning Exponential Family Distributions »
Abhin Shah · Devavrat Shah · Gregory Wornell -
2021 Poster: Regulating algorithmic filtering on social media »
Sarah Cen · Devavrat Shah -
2021 Poster: Change Point Detection via Multivariate Singular Spectrum Analysis »
Arwa Alanqary · Abdullah Alomar · Devavrat Shah -
2021 Poster: PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators »
Anish Agarwal · Abdullah Alomar · Varkey Alumootil · Devavrat Shah · Dennis Shen · Zhi Xu · Cindy Yang -
2020 Poster: Estimation of Skill Distribution from a Tournament »
Ali Jadbabaie · Anuran Makur · Devavrat Shah -
2020 Spotlight: Estimation of Skill Distribution from a Tournament »
Ali Jadbabaie · Anuran Makur · Devavrat Shah -
2020 Poster: Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation »
Devavrat Shah · Dogyoon Song · Zhi Xu · Yuzhe Yang -
2020 Demonstration: tspDB: Time Series Predict DB »
Anish Agarwal · Abdullah Alomar · Devavrat Shah -
2019 Poster: On Robustness of Principal Component Regression »
Anish Agarwal · Devavrat Shah · Dennis Shen · Dogyoon Song -
2019 Oral: On Robustness of Principal Component Regression »
Anish Agarwal · Devavrat Shah · Dennis Shen · Dogyoon Song -
2019 Poster: Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption »
Wei Ma · George H Chen -
2019 Tutorial: Synthetic Control »
Alberto Abadie · Vishal Misra · Devavrat Shah -
2018 Poster: Q-learning with Nearest Neighbors »
Devavrat Shah · Qiaomin Xie -
2017 : Iterative Collaborative Filtering for Sparse Matrix Estimation »
Christina Lee -
2017 : Coffee break and Poster Session I »
Nishith Khandwala · Steve Gallant · Gregory Way · Aniruddh Raghu · Li Shen · Aydan Gasimova · Alican Bozkurt · William Boag · Daniel Lopez-Martinez · Ulrich Bodenhofer · Samaneh Nasiri GhoshehBolagh · Michelle Guo · Christoph Kurz · Kirubin Pillay · Kimis Perros · George H Chen · Alexandre Yahi · Madhumita Sushil · Sanjay Purushotham · Elena Tutubalina · Tejpal Virdi · Marc-Andre Schulz · Samuel Weisenthal · Bharat Srikishan · Petar Veličković · Kartik Ahuja · Andrew Miller · Erin Craig · Disi Ji · Filip Dabek · Chloé Pou-Prom · Hejia Zhang · Janani Kalyanam · Wei-Hung Weng · Harish Bhat · Hugh Chen · Simon Kohl · Mingwu Gao · Tingting Zhu · Ming-Zher Poh · Iñigo Urteaga · Antoine Honoré · Alessandro De Palma · Maruan Al-Shedivat · Pranav Rajpurkar · Matthew McDermott · Vincent Chen · Yanan Sui · Yun-Geun Lee · Li-Fang Cheng · Chen Fang · Sibt ul Hussain · Cesare Furlanello · Zeev Waks · Hiba Chougrad · Hedvig Kjellstrom · Finale Doshi-Velez · Wolfgang Fruehwirt · Yanqing Zhang · Lily Hu · Junfang Chen · Sunho Park · Gatis Mikelsons · Jumana Dakka · Stephanie Hyland · yann chevaleyre · Hyunwoo Lee · Xavier Giro-i-Nieto · David Kale · Michael Hughes · Gabriel Erion · Rishab Mehra · William Zame · Stojan Trajanovski · Prithwish Chakraborty · Kelly Peterson · Muktabh Mayank Srivastava · Amy Jin · Heliodoro Tejeda Lemus · Priyadip Ray · Tamas Madl · Joseph Futoma · Enhao Gong · Syed Rameel Ahmad · Eric Lei · Ferdinand Legros -
2017 : A millennium of nearest neighbor methods – an introduction to the NIPS nearest neighbor workshop 2017 »
George H Chen -
2017 Poster: Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation »
Christian Borgs · Jennifer Chayes · Christina Lee · Devavrat Shah -
2016 Poster: Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering »
Dogyoon Song · Christina Lee · Yihua Li · Devavrat Shah -
2014 Workshop: Analysis of Rank Data: Confluence of Social Choice, Operations Research, and Machine Learning »
Shivani Agarwal · Hossein Azari Soufiani · Guy Bresler · Sewoong Oh · David Parkes · Arun Rajkumar · Devavrat Shah -
2014 Poster: Hardness of parameter estimation in graphical models »
Guy Bresler · David Gamarnik · Devavrat Shah -
2014 Poster: A Latent Source Model for Online Collaborative Filtering »
Guy Bresler · George H Chen · Devavrat Shah -
2014 Spotlight: A Latent Source Model for Online Collaborative Filtering »
Guy Bresler · George H Chen · Devavrat Shah -
2014 Poster: Learning Mixed Multinomial Logit Model from Ordinal Data »
Sewoong Oh · Devavrat Shah -
2014 Poster: Structure learning of antiferromagnetic Ising models »
Guy Bresler · David Gamarnik · Devavrat Shah -
2013 Workshop: Crowdsourcing: Theory, Algorithms and Applications »
Jennifer Wortman Vaughan · Greg Stoddard · Chien-Ju Ho · Adish Singla · Michael Bernstein · Devavrat Shah · Arpita Ghosh · Evgeniy Gabrilovich · Denny Zhou · Nikhil Devanur · Xi Chen · Alexander Ihler · Qiang Liu · Genevieve Patterson · Ashwinkumar Badanidiyuru Varadaraja · Hossein Azari Soufiani · Jacob Whitehill -
2013 Poster: A Latent Source Model for Nonparametric Time Series Classification »
George H Chen · Stanislav Nikolov · Devavrat Shah -
2013 Poster: Computing the Stationary Distribution Locally »
Christina Lee · Asuman Ozdaglar · Devavrat Shah -
2012 Poster: Iterative ranking from pair-wise comparisons »
Sahand N Negahban · Sewoong Oh · Devavrat Shah -
2012 Spotlight: Iterative ranking from pair-wise comparisons »
Sahand N Negahban · Sewoong Oh · Devavrat Shah -
2011 Poster: Iterative Learning for Reliable Crowdsourcing Systems »
David R Karger · Sewoong Oh · Devavrat Shah -
2011 Oral: Iterative Learning for Reliable Crowdsourcing Systems »
David R Karger · Sewoong Oh · Devavrat Shah -
2009 Poster: A Data-Driven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah -
2009 Spotlight: A Data-Driven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah -
2009 Poster: Local Rules for Global MAP: When Do They Work ? »
Kyomin Jung · Pushmeet Kohli · Devavrat Shah -
2008 Poster: Inferring rankings under constrained sensing »
Srikanth Jagabathula · Devavrat Shah -
2008 Oral: Inferring rankings under constrained sensing »
Srikanth Jagabathula · Devavrat Shah -
2007 Spotlight: Message Passing for Max-weight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky -
2007 Poster: Message Passing for Max-weight Independent Set »
Sujay Sanghavi · Devavrat Shah · Alan S Willsky -
2007 Poster: Local Algorithms for Approximate Inference in Minor-Excluded Graphs »
Kyomin Jung · Devavrat Shah