Timezone: »
Poster
Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Laxman Dhulipala · David Eisenstat · Jakub Lacki · Vahab Mirrokni · Jessica Shi
Obtaining scalable algorithms for \emph{hierarchical agglomerative clustering} (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a $(1+\epsilon)$-approximation algorithm for this problem on $m$ edge graphs using $\tilde{O}(m)$ work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for \emph{exact} average-linkage HAC is not possible under standard complexity-theoretic assumptions.We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. On a broad set of large publicly-available real-world datasets, we find that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm. We also show that ParHAC can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.
Author Information
Laxman Dhulipala (UMD)
David Eisenstat (Google)
Jakub Lacki (Google)
Vahab Mirrokni (Google Research)
Jessica Shi (Massachusetts Institute of Technology)
More from the Same Authors
-
2022 Poster: Posted Pricing and Dynamic Prior-independent Mechanisms with Value Maximizers »
Yuan Deng · Vahab Mirrokni · Hanrui Zhang -
2022 : Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank »
Alessandro Epasto · Vahab Mirrokni · Bryan Perozzi · Anton Tsitsulin · Peilin Zhong -
2022 Poster: Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank »
Alessandro Epasto · Vahab Mirrokni · Bryan Perozzi · Anton Tsitsulin · Peilin Zhong -
2022 Poster: Stars: Tera-Scale Graph Building for Clustering and Learning »
CJ Carey · Jonathan Halcrow · Rajesh Jayaram · Vahab Mirrokni · Warren Schudy · Peilin Zhong -
2022 Poster: Near-Optimal Private and Scalable $k$-Clustering »
Vincent Cohen-Addad · Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2022 Poster: Anonymous Bandits for Multi-User Systems »
Hossein Esfandiari · Vahab Mirrokni · Jon Schneider -
2022 Poster: Cluster Randomized Designs for One-Sided Bipartite Experiments »
Jennifer Brennan · Vahab Mirrokni · Jean Pouget-Abadie -
2021 Poster: Robust Auction Design in the Auto-bidding World »
Santiago Balseiro · Yuan Deng · Jieming Mao · Vahab Mirrokni · Song Zuo -
2021 Poster: Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls »
Nick Doudchenko · Khashayar Khosravi · Jean Pouget-Abadie · Sébastien Lahaie · Miles Lubin · Vahab Mirrokni · Jann Spiess · guido imbens -
2021 Poster: Parallelizing Thompson Sampling »
Amin Karbasi · Vahab Mirrokni · Mohammad Shadravan -
2020 Poster: Faster DBSCAN via subsampled similarity queries »
Heinrich Jiang · Jennifer Jang · Jakub Lacki -
2020 : Multi-core parallel graph clustering »
Jakub Lacki -
2020 : Graph algorithms in the distributed setting »
Jakub Lacki -
2020 : Community Detection »
Jakub Lacki -
2020 Expo Workshop: Mining and Learning with Graphs at Scale »
Vahab Mirrokni · Bryan Perozzi · Jakub Lacki · Jonathan Halcrow · Jaqui C Herman