Timezone: »
Dirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined. Herein, we present the infinite mixture of infinite Gaussian mixtures (I2GMM) for more flexible modeling of data sets with skewed and multi-modal cluster distributions. Instead of using a single Gaussian for each cluster as in the standard DPMG model, the generative model of I2GMM uses a single DPMG for each cluster. The individual DPMGs are linked together through centering of their base distributions at the atoms of a higher level DP prior. Inference is performed by a collapsed Gibbs sampler that also enables partial parallelization. Experimental results on several artificial and real-world data sets suggest the proposed I2GMM model can predict clusters more accurately than existing variational Bayes and Gibbs sampler versions of DPMG.
Author Information
Halid Z Yerebakan (IUPUI)
Bartek Rajwa (Purdue University)
Murat Dundar (IUPUI)
More from the Same Authors
-
2014 Poster: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Spotlight: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2006 Demonstration: Computer Aided Diagnosis of Early Stage Cancer »
Marcos Salganicoff · Luca Bogoni · R. Bharat Rao · Murat Dundar · Balaji R Krishnapuram · Glenn Fung -
2006 Poster: Multiple Instance Learning for Computer Aided Diagnosis »
Glenn Fung · Murat Dundar · Balaji R Krishnapuram · R. Bharat Rao