NeurIPS 2020 : Simple and Scalable Sparse k-means Clustering via Feature Ranking



Simple and Scalable Sparse k-means Clustering via Feature Ranking

Zhiyue Zhang, Kenneth Lange, Jason Xu

Spotlight presentation: Orals & Spotlights Track 05: Clustering/Ranking
on 2020-12-08T07:00:00-08:00 - 2020-12-08T07:10:00-08:00

Poster Session 2 (more posters)
on 2020-12-08T09:00:00-08:00 - 2020-12-08T11:00:00-08:00

Toggle Abstract Paper (in Proceedings / .pdf)

Abstract: Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.

Simple and Scalable Sparse k-means Clustering via Feature Ranking

Zhiyue Zhang, Kenneth Lange, Jason Xu

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.