Modern Nonparametric Methods in Machine Learning

Arthur Gretton · Mladen Kolar · Samory Kpotufe · John Lafferty · Han Liu · Bernhard Schölkopf · Alexander Smola · Rob Nowak · Mikhail Belkin · Lorenzo Rosasco · peter bickel · Yue Zhao

Harvey's Zephyr

Modern data acquisition routinely produces massive and complex datasets. Examples are data from high throughput genomic experiments, climate data from worldwide data centers, robotic control data collected overtime in adversarial settings, user-behavior data from social networks, user preferences on online markets, and so forth. Modern pattern recognition problems arising in such disciplines are characterized by large data sizes, large number of observed variables, and increased pattern complexity. Therefore, nonparametric methods which can handle generally complex patterns are ever more relevant for modern data analysis. However, the larger data sizes and number of variables constitute new challenges for nonparametric methods in general. The aim of this workshop is to bring together both theoretical and applied researchers to discuss these modern challenges in detail, share insight on existing solutions, and lay out some of the important future directions.

Through a number of invited and contributed talks and a focused panel discussion, we plan to emphasize the importance of nonparametric methods and present challenges for modern nonparametric methods. In particular, we focus on the following aspect of nonparametric methods:

A. General motivations for nonparametric methods:

* the abundance of modern applications where little is known about data generating mechanisms (e.g., robotics, biology, social networks, recommendation systems)

* the ability of nonparametric analysis to capture general aspects of learning such as bias-variance tradeoffs, and thus yielding general insight on the inherent complexity of various learning tasks.

B. Modern challenges for nonparametric methods:

* handling big data: while large data sizes are a blessing w.r.t. generalization performance, they also present a modern challenge for nonparametric learning w.r.t. time-efficiency. In this context, we need to characterize trade-off between time and accuracy, create online or stream-based solutions, and develop approximation methods.

* larger problem complexity: large data is often paired with (1) large data dimension (number of observed variables), and (2) more complex target model spaces (e.g. less smooth regression function). To handle large data dimensions, likely solutions are methods that perform nonlinear dimension reduction, nonparametric variable selection, or adapt to the intrinsic dimension of the data. To handle the increased complexity of target model spaces, we require modern model selection procedures that can efficiently scale to modern data sizes while adapting to the complexity of the problem at hand.

Live content is unavailable. Log in and register to view live content