Timezone: »

Greedy Sampling for Approximate Clustering in the Presence of Outliers
Aditya Bhaskara · Sharvaree Vadgama · Hong Xu

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #33

Greedy algorithms such as adaptive sampling (k-means++) and furthest point traversal are popular choices for clustering problems. One the one hand, they possess good theoretical approximation guarantees, and on the other, they are fast and easy to implement. However, one main issue with these algorithms is the sensitivity to noise/outliers in the data. In this work we show that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers. For instance, in the case of k-means++, we show that a simple thresholding operation on the distances suffices to obtain an O(\log k) approximation to the objective. We obtain similar results for the simpler k-center problem. Finally, we show experimentally that our algorithms are easy to implement and scale well. We also measure their ability to identify noisy points added to a dataset.

Author Information

Aditya Bhaskara (University of Utah)
Sharvaree Vadgama (University of Utah)
Hong Xu (University of Utah)

More from the Same Authors