Timezone: »

Simple and Scalable Sparse k-means Clustering via Feature Ranking
Zhiyue Zhang · Kenneth Lange · Jason Xu

Tue Dec 08 09:00 AM -- 11:00 AM (PST) @ Poster Session 1 #275

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.

Author Information

Zhiyue Zhang (Duke University)
Kenneth Lange (UCLA)
Jason Xu (Duke University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors