Timezone: »
Poster
Subquadratic High-Dimensional Hierarchical Clustering
Amir Abboud · Vincent Cohen-Addad · Hussein Houdrouge
Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #40
We consider the widely-used average-linkage, single-linkage, and Ward's methods for
computing hierarchical clusterings of high-dimensional Euclidean inputs.
It is easy to show that there is no efficient implementation of these algorithms
in high dimensional Euclidean space since it implicitly requires to solve the closest
pair problem, a notoriously difficult problem.
However, how fast can these algorithms be implemented if we allow approximation?
More precisely: these algorithms successively merge the clusters that are at closest
average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge $\gamma$-approximate closest clusters (namely, clusters that are at distance (average, minimum, or sum-of-square error) at most $\gamma$ times the distance of the closest clusters).
We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using $\widetilde{O}(n)$ $\gamma$-approximate nearest neighbor queries.
This leads to an algorithm running in time $\widetilde{O}(nd) + n^{1+O(1/\gamma)}$ for $d$-dimensional Euclidean space.
We then provide experiments showing that these algorithms perform as well as the non-approximate version for classic classification tasks while achieving a significant speed-up.
Author Information
Amir Abboud (IBM research)
Vincent Cohen-Addad (CNRS & Sorbonne Université)
Hussein Houdrouge (Ecole Polytechnique)
More from the Same Authors
-
2021 Spotlight: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2022 : Scalable and Improved Algorithms for Individually Fair Clustering »
Mohammadhossein Bateni · Vincent Cohen-Addad · Alessandro Epasto · Silvio Lattanzi -
2022 Poster: Improved Coresets for Euclidean $k$-Means »
Vincent Cohen-Addad · Kasper Green Larsen · David Saulpic · Chris Schwiegelshohn · Omar Ali Sheikh-Omar -
2022 Poster: Near-Optimal Correlation Clustering with Privacy »
Vincent Cohen-Addad · Chenglin Fan · Silvio Lattanzi · Slobodan Mitrovic · Ashkan Norouzi-Fard · Nikos Parotsidis · Jakub Tarnawski -
2022 Poster: Near-Optimal Private and Scalable $k$-Clustering »
Vincent Cohen-Addad · Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2021 Poster: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2021 Poster: Parallel and Efficient Hierarchical k-Median Clustering »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2020 Poster: Fast and Accurate $k$-means++ via Rejection Sampling »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2020 Poster: Impossibility Results for Grammar-Compressed Linear Algebra »
Amir Abboud · Arturs Backurs · Karl Bringmann · Marvin Künnemann -
2020 Poster: On the Power of Louvain in the Stochastic Block Model »
Vincent Cohen-Addad · Adrian Kosowski · Frederik Mallmann-Trenn · David Saulpic -
2019 Poster: Fully Dynamic Consistent Facility Location »
Vincent Cohen-Addad · Niklas Oskar D Hjuler · Nikos Parotsidis · David Saulpic · Chris Schwiegelshohn -
2018 Poster: Clustering Redemption–Beyond the Impossibility of Kleinberg’s Axioms »
Vincent Cohen-Addad · Varun Kanade · Frederik Mallmann-Trenn -
2017 Poster: Hierarchical Clustering Beyond the Worst-Case »
Vincent Cohen-Addad · Varun Kanade · Frederik Mallmann-Trenn