Timezone: »
Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, recently Dasgupta [1] proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective (see also [2, 3, 4]). In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic blockmodel (HSBM), and show that in certain regimes the SVD approach of McSherry [5] combined with specific linkage methods results in a clustering that give an O(1)-approximation to Dasgupta’s cost function. We also show that an approach based on SDP relaxations for balanced cuts based on the work of Makarychev et al. [6], combined with the recursive sparsest cut algorithm of Dasgupta, yields an O(1) approximation in slightly larger regimes and also in the semi-random setting, where an adversary may remove edges from the random graph generated according to an HSBM. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.
Author Information
Vincent Cohen-Addad (University of Copenhagen)
Varun Kanade (University of Oxford)
Frederik Mallmann-Trenn (ENS)
More from the Same Authors
-
2021 Spotlight: Towards optimally abstaining from prediction with OOD test examples »
Adam Kalai · Varun Kanade -
2021 Spotlight: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2022 : When are Local Queries Useful for Robust Learning? »
Pascale Gourdeau · Varun Kanade · Marta Kwiatkowska · James Worrell -
2022 : Scalable and Improved Algorithms for Individually Fair Clustering »
Mohammadhossein Bateni · Vincent Cohen-Addad · Alessandro Epasto · Silvio Lattanzi -
2023 Poster: Multi-Swap k-Means++ »
Lorenzo Beretta · Vincent Cohen-Addad · Silvio Lattanzi · Nikos Parotsidis -
2023 Poster: Partial Matrix Completion »
Elad Hazan · Adam Tauman Kalai · Varun Kanade · Clara Mohri · Y. Jennifer Sun -
2023 Poster: Private estimation algorithms for stochastic block models and mixture models »
Hongjie Chen · Vincent Cohen-Addad · Tommaso d'Orsi · Alessandro Epasto · Jacob Imola · David Steurer · Stefan Tiegel -
2022 Poster: Improved Coresets for Euclidean $k$-Means »
Vincent Cohen-Addad · Kasper Green Larsen · David Saulpic · Chris Schwiegelshohn · Omar Ali Sheikh-Omar -
2022 Poster: When are Local Queries Useful for Robust Learning? »
Pascale Gourdeau · Varun Kanade · Marta Kwiatkowska · James Worrell -
2022 Poster: Near-Optimal Correlation Clustering with Privacy »
Vincent Cohen-Addad · Chenglin Fan · Silvio Lattanzi · Slobodan Mitrovic · Ashkan Norouzi-Fard · Nikos Parotsidis · Jakub Tarnawski -
2022 Poster: Near-Optimal Private and Scalable $k$-Clustering »
Vincent Cohen-Addad · Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2021 Poster: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2021 Poster: Parallel and Efficient Hierarchical k-Median Clustering »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2021 Poster: Towards optimally abstaining from prediction with OOD test examples »
Adam Kalai · Varun Kanade -
2020 Poster: The Statistical Complexity of Early-Stopped Mirror Descent »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2020 Spotlight: The Statistical Complexity of Early-Stopped Mirror Descent »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2018 Poster: Clustering Redemption–Beyond the Impossibility of Kleinberg’s Axioms »
Vincent Cohen-Addad · Varun Kanade · Frederik Mallmann-Trenn -
2017 Poster: From which world is your graph »
Cheng Li · Felix MF Wong · Zhenming Liu · Varun Kanade