Timezone: »
For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems involving many features. A variety of distributed algorithms have been proposed in this context, but challenges arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. We propose a MEdian Selection Subset AGgregation Estimator (message) algorithm, which attempts to solve these problems. The algorithm applies feature selection in parallel for each subset using Lasso or another method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in both sample and feature size, and has theoretical guarantees. In particular, we show model selection consistency and coefficient estimation efficiency. Extensive experiments show excellent performance in variable selection, estimation, prediction, and computation time relative to usual competitors.
Author Information
Xiangyu Wang (Google Inc.)
Peichao Peng (University of Pennsylvania)
David B Dunson (Duke University)
Related Events (a corresponding poster, oral, or spotlight)
-
2014 Poster: Median Selection Subset Aggregation for Parallel Inference »
Wed. Dec 10th 12:00 -- 04:59 AM Room Level 2, room 210D
More from the Same Authors
-
2016 Poster: Towards Unifying Hamiltonian Monte Carlo and Slice Sampling »
Yizhe Zhang · Xiangyu Wang · Changyou Chen · Ricardo Henao · Kai Fan · Lawrence Carin -
2016 Poster: DECOrrelated feature space partitioning for distributed sparse regression »
Xiangyu Wang · David B Dunson · Chenlei Leng -
2015 Poster: Parallelizing MCMC with Random Partition Trees »
Xiangyu Wang · Fangjian Guo · Katherine Heller · David B Dunson -
2015 Poster: On the consistency theory of high dimensional variable screening »
Xiangyu Wang · Chenlei Leng · David B Dunson -
2015 Poster: Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process »
Ye Wang · David B Dunson -
2013 Poster: Locally Adaptive Bayesian Multivariate Time Series »
Daniele Durante · Bruno Scarpa · David B Dunson -
2013 Poster: Multiscale Dictionary Learning for Estimating Conditional Distributions »
Francesca Petralia · Joshua T Vogelstein · David B Dunson -
2012 Poster: Multiresolution Gaussian Processes »
Emily Fox · David B Dunson -
2012 Poster: Repulsive Mixtures »
FRANCESCA PETRALIA · Vinayak Rao · David B Dunson -
2011 Poster: Generalized Beta Mixtures of Gaussians »
Artin Armagan · David B Dunson · Merlise Clyde -
2011 Poster: The Kernel Beta Process »
Lu Ren · Yingjian Wang · David B Dunson · Lawrence Carin -
2011 Spotlight: The Kernel Beta Process »
Lu Ren · Yingjian Wang · David B Dunson · Lawrence Carin -
2011 Poster: Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices »
XianXing Zhang · David B Dunson · Lawrence Carin -
2010 Poster: Joint Analysis of Time-Evolving Binary Matrices and Associated Documents »
Eric X Wang · Dehong Liu · Jorge G Silva · David B Dunson · Lawrence Carin -
2009 Workshop: Nonparametric Bayes »
Dilan Gorur · Francois Caron · Yee Whye Teh · David B Dunson · Zoubin Ghahramani · Michael Jordan -
2009 Poster: A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation »
Lan Du · Lu Ren · David B Dunson · Lawrence Carin