Skip to yearly menu bar Skip to main content


Oral

Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search

Benjamin Moseley · Joshua Wang

Abstract: Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, there is a lack of an analytical foundation for the method. Having such a foundation would both support the methods currently used and guide future improvements. This paper gives an applied algorithmic foundation for hierarchical clustering. The goal of this paper is to give an analytic framework supporting observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main results are that one of the most popular algorithms used in practice, average-linkage agglomerative clustering, has a small constant approximation ratio. Further, this paper establishes that using recursive $k$-means divisive clustering has a very poor lower bound on its approximation ratio, perhaps explaining why is it not as popular in practice. Motivated by the poor performance of $k$-means, we seek to find divisive algorithms that do perform well theoretically and this paper gives two constant approximation algorithms. This paper represents some of the first work giving a foundation for hierarchical clustering algorithms used in practice.

Chat is not available.