Timezone: »
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space). While the majority of the literature focuses on sample space partitioning, feature space partitioning is more effective when p >> n. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In this paper, we solve these problems through a new embarrassingly parallel framework named DECO for distributed variable selection and parameter estimation. In DECO, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.
Author Information
Xiangyu Wang (Duke University)
David B Dunson (Duke University)
Chenlei Leng (University of Warwick)
More from the Same Authors
-
2016 Poster: Towards Unifying Hamiltonian Monte Carlo and Slice Sampling »
Yizhe Zhang · Xiangyu Wang · Changyou Chen · Ricardo Henao · Kai Fan · Lawrence Carin -
2015 Poster: Parallelizing MCMC with Random Partition Trees »
Xiangyu Wang · Fangjian Guo · Katherine Heller · David B Dunson -
2015 Poster: On the consistency theory of high dimensional variable screening »
Xiangyu Wang · Chenlei Leng · David B Dunson -
2015 Poster: Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process »
Ye Wang · David B Dunson -
2014 Poster: Median Selection Subset Aggregation for Parallel Inference »
Xiangyu Wang · Peichao Peng · David B Dunson -
2014 Oral: Median Selection Subset Aggregation for Parallel Inference »
Xiangyu Wang · Peichao Peng · David B Dunson -
2014 Poster: Convex Optimization Procedure for Clustering: Theoretical Revisit »
Changbo Zhu · Huan Xu · Chenlei Leng · Shuicheng Yan -
2013 Poster: Locally Adaptive Bayesian Multivariate Time Series »
Daniele Durante · Bruno Scarpa · David B Dunson -
2013 Poster: Provable Subspace Clustering: When LRR meets SSC »
Yu-Xiang Wang · Huan Xu · Chenlei Leng -
2013 Spotlight: Provable Subspace Clustering: When LRR meets SSC »
Yu-Xiang Wang · Huan Xu · Chenlei Leng -
2013 Poster: Multiscale Dictionary Learning for Estimating Conditional Distributions »
Francesca Petralia · Joshua T Vogelstein · David B Dunson -
2012 Poster: Multiresolution Gaussian Processes »
Emily Fox · David B Dunson -
2012 Poster: Repulsive Mixtures »
FRANCESCA PETRALIA · Vinayak Rao · David B Dunson -
2011 Poster: Generalized Beta Mixtures of Gaussians »
Artin Armagan · David B Dunson · Merlise Clyde -
2011 Poster: The Kernel Beta Process »
Lu Ren · Yingjian Wang · David B Dunson · Lawrence Carin -
2011 Spotlight: The Kernel Beta Process »
Lu Ren · Yingjian Wang · David B Dunson · Lawrence Carin -
2011 Poster: Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices »
XianXing Zhang · David B Dunson · Lawrence Carin -
2010 Poster: Joint Analysis of Time-Evolving Binary Matrices and Associated Documents »
Eric X Wang · Dehong Liu · Jorge G Silva · David B Dunson · Lawrence Carin -
2009 Workshop: Nonparametric Bayes »
Dilan Gorur · Francois Caron · Yee Whye Teh · David B Dunson · Zoubin Ghahramani · Michael Jordan -
2009 Poster: A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation »
Lan Du · Lu Ren · David B Dunson · Lawrence Carin