Timezone: »
Greedy algorithms such as adaptive sampling (k-means++) and furthest point traversal are popular choices for clustering problems. One the one hand, they possess good theoretical approximation guarantees, and on the other, they are fast and easy to implement. However, one main issue with these algorithms is the sensitivity to noise/outliers in the data. In this work we show that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers. For instance, in the case of k-means++, we show that a simple thresholding operation on the distances suffices to obtain an O(\log k) approximation to the objective. We obtain similar results for the simpler k-center problem. Finally, we show experimentally that our algorithms are easy to implement and scale well. We also measure their ability to identify noisy points added to a dataset.
Author Information
Aditya Bhaskara (University of Utah)
Sharvaree Vadgama (University of Utah)
Hong Xu (University of Utah)
More from the Same Authors
-
2023 Poster: Tight Bounds for Volumetric Spanners and Applications »
Aditya Bhaskara · Sepideh Mahabadi · Ali Vakilian -
2021 Poster: Logarithmic Regret from Sublinear Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2020 Poster: Adaptive Probing Policies for Shortest Path Routing »
Aditya Bhaskara · Sreenivas Gollapudi · Kostas Kollias · Kamesh Munagala -
2020 Poster: Online Linear Optimization with Many Hints »
Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit -
2020 Poster: Online MAP Inference of Determinantal Point Processes »
Aditya Bhaskara · Amin Karbasi · Silvio Lattanzi · Morteza Zadimoghaddam -
2019 Poster: On Distributed Averaging for Stochastic k-PCA »
Aditya Bhaskara · Pruthuvi Maheshakya Wijewardena -
2016 Poster: Linear Relaxations for Finding Diverse Elements in Metric Spaces »
Aditya Bhaskara · Mehrdad Ghadiri · Vahab Mirrokni · Ola Svensson -
2014 Poster: Distributed Balanced Clustering via Mapping Coresets »
Mohammadhossein Bateni · Aditya Bhaskara · Silvio Lattanzi · Vahab Mirrokni