Timezone: »
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.
Author Information
Zhiyue Zhang (Duke University)
Kenneth Lange (UCLA)
Jason Xu (Duke University)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Simple and Scalable Sparse k-means Clustering via Feature Ranking »
Tue. Dec 8th 05:00 -- 07:00 PM Room Poster Session 1 #275
More from the Same Authors
-
2021 Spotlight: Uniform Concentration Bounds toward a Unified Framework for Robust Clustering »
Debolina Paul · Saptarshi Chakraborty · Swagatam Das · Jason Xu -
2021 Poster: Uniform Concentration Bounds toward a Unified Framework for Robust Clustering »
Debolina Paul · Saptarshi Chakraborty · Swagatam Das · Jason Xu -
2017 Poster: Generalized Linear Model Regression under Distance-to-set Penalties »
Jason Xu · Eric Chi · Kenneth Lange -
2017 Spotlight: Generalized Linear Model Regression under Distance-to-set Penalties »
Jason Xu · Eric Chi · Kenneth Lange