`

Timezone: »

 
Poster
Affinity Clustering: Hierarchical Clustering at Scale
Mohammadhossein Bateni · Soheil Behnezhad · Mahsa Derakhshan · MohammadTaghi Hajiaghayi · Raimondas Kiveris · Silvio Lattanzi · Vahab Mirrokni

Tue Dec 05 06:30 PM -- 10:30 PM (PST) @ Pacific Ballroom #27 #None
Graph clustering is a fundamental task in many data-mining and machine-learning pipelines. In particular, identifying a good hierarchical structure is at the same time a fundamental and challenging problem for several applications. The amount of data to analyze is increasing at an astonishing rate each day. Hence there is a need for new solutions to efficiently compute effective hierarchical clusterings on such huge data. The main focus of this paper is on minimum spanning tree (MST) based clusterings. In particular, we propose affinity, a novel hierarchical clustering based on Boruvka's MST algorithm. We prove certain theoretical guarantees for affinity (as well as some other classic algorithms) and show that in practice it is superior to several other state-of-the-art clustering algorithms. Furthermore, we present two MapReduce implementations for affinity. The first one works for the case where the input graph is dense and takes constant rounds. It is based on a Massively Parallel MST algorithm for dense graphs that improves upon the state-of-the-art algorithm of Lattanzi et al. (SPAA 2011). Our second algorithm has no assumption on the density of the input graph and finds the affinity clustering in $O(\log n)$ rounds using Distributed Hash Tables (DHTs). We show experimentally that our algorithms are scalable for huge data sets, e.g., for graphs with trillions of edges.

Author Information

Mohammadhossein Bateni (Google research)
Soheil Behnezhad (University of Maryland)
Mahsa Derakhshan (University of Maryland)
MohammadTaghi Hajiaghayi (University of Maryland)
Raimondas Kiveris (Google research)
Silvio Lattanzi (Google Research)
Vahab Mirrokni (Google Research NYC)

More from the Same Authors