Timezone: »
Poster
Sparse Embedded $k$-Means Clustering
Weiwei Liu · Xiaobo Shen · Ivor Tsang
The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min \{n,d\}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.
Author Information
Weiwei Liu (The University of New South Wales)
Xiaobo Shen (NJUST)
Ivor Tsang (University of Technology, Sydney)
More from the Same Authors
-
2022 Poster: Defending Against Adversarial Attacks via Neural Dynamic System »
Xiyuan Li · Zou Xin · Weiwei Liu -
2022 Poster: On the Tradeoff Between Robustness and Fairness »
Xinsong Ma · Zekai Wang · Weiwei Liu -
2022 Poster: On Robust Multiclass Learnability »
Jingyuan Xu · Weiwei Liu -
2020 Poster: Graph Cross Networks with Vertex Infomax Pooling »
Maosen Li · Siheng Chen · Ya Zhang · Ivor Tsang -
2020 Oral: Graph Cross Networks with Vertex Infomax Pooling »
Maosen Li · Siheng Chen · Ya Zhang · Ivor Tsang -
2020 Poster: Subgroup-based Rank-1 Lattice Quasi-Monte Carlo »
Yueming LYU · Yuan Yuan · Ivor Tsang -
2019 Poster: Copula Multi-label Learning »
Weiwei Liu -
2018 Poster: Masking: A New Perspective of Noisy Supervision »
Bo Han · Jiangchao Yao · Gang Niu · Mingyuan Zhou · Ivor Tsang · Ya Zhang · Masashi Sugiyama -
2018 Poster: Co-teaching: Robust training of deep neural networks with extremely noisy labels »
Bo Han · Quanming Yao · Xingrui Yu · Gang Niu · Miao Xu · Weihua Hu · Ivor Tsang · Masashi Sugiyama -
2015 Poster: On the Optimality of Classifier Chain for Multi-label Classification »
Weiwei Liu · Ivor Tsang