Timezone: »
Poster
Parallel and Efficient Hierarchical k-Median Clustering
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson
As a fundamental unsupervised learning task, hierarchical clustering has been extensively studied in the past decade. In particular, standard metric formulations as hierarchical $k$-center, $k$-means, and $k$-median received a lot of attention and the problems have been studied extensively in different models of computation. Despite all this interest, not many efficient parallel algorithms are known for these problems. In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution. Furthermore, we also get that for all $k$ simultanuously the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta \log (\Delta d n))$ factor bigger that the corresponding optimal solution. The algorithm requires in $O\left(\log_{s} (nd\log(n+\Delta))\right)$ rounds. Here $d$ is the dimension of the data set and $\Delta$ is the ratio between the maximum and minimum distance of two points in the input dataset. To the best of our knowledge, this is the first \emph{parallel} algorithm for the hierarchical $k$-median problem with theoretical guarantees. We further complement our theoretical results with an empirical study of our algorithm that shows its effectiveness in practice.
Author Information
Vincent Cohen-Addad (Google research)
Silvio Lattanzi (Google Research)
Ashkan Norouzi-Fard (Google Research)
Christian Sohler (TU Dortmund)
Ola Svensson (EPFL)
More from the Same Authors
-
2021 Spotlight: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2022 : Scalable and Improved Algorithms for Individually Fair Clustering »
Mohammadhossein Bateni · Vincent Cohen-Addad · Alessandro Epasto · Silvio Lattanzi -
2022 Poster: Improved Coresets for Euclidean $k$-Means »
Vincent Cohen-Addad · Kasper Green Larsen · David Saulpic · Chris Schwiegelshohn · Omar Ali Sheikh-Omar -
2022 Poster: Active Learning of Classifiers with Label and Seed Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice · Maximilian Thiessen -
2022 Poster: Near-Optimal Correlation Clustering with Privacy »
Vincent Cohen-Addad · Chenglin Fan · Silvio Lattanzi · Slobodan Mitrovic · Ashkan Norouzi-Fard · Nikos Parotsidis · Jakub Tarnawski -
2022 Poster: Near-Optimal Private and Scalable $k$-Clustering »
Vincent Cohen-Addad · Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2022 Poster: Efficient and Stable Fully Dynamic Facility Location »
Sayan Bhattacharya · Silvio Lattanzi · Nikos Parotsidis -
2021 Poster: Online Facility Location with Multiple Advice »
Matteo Almanza · Flavio Chierichetti · Silvio Lattanzi · Alessandro Panconesi · Giuseppe Re -
2021 Poster: Nearly-Tight and Oblivious Algorithms for Explainable Clustering »
Buddhima Gamlath · Xinrui Jia · Adam Polak · Ola Svensson -
2021 Poster: Robust Online Correlation Clustering »
Silvio Lattanzi · Benjamin Moseley · Sergei Vassilvitskii · Yuyan Wang · Rudy Zhou -
2021 Poster: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2021 Poster: Efficient and Local Parallel Random Walks »
Michael Kapralov · Silvio Lattanzi · Navid Nouri · Jakab Tardos -
2021 Poster: Streaming Belief Propagation for Community Detection »
Yuchen Wu · Jakab Tardos · Mohammadhossein Bateni · André Linhares · Filipe Miguel Goncalves de Almeida · Andrea Montanari · Ashkan Norouzi-Fard -
2021 Poster: On Margin-Based Cluster Recovery with Oracle Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice -
2020 Poster: Fully Dynamic Algorithm for Constrained Submodular Optimization »
Silvio Lattanzi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakub Tarnawski · Morteza Zadimoghaddam -
2020 Poster: The Primal-Dual method for Learning Augmented Algorithms »
Etienne Bamas · Andreas Maggiori · Ola Svensson -
2020 Oral: Fully Dynamic Algorithm for Constrained Submodular Optimization »
Silvio Lattanzi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakub Tarnawski · Morteza Zadimoghaddam -
2020 Oral: The Primal-Dual method for Learning Augmented Algorithms »
Etienne Bamas · Andreas Maggiori · Ola Svensson -
2020 Poster: Sliding Window Algorithms for k-Clustering Problems »
Michele Borassi · Alessandro Epasto · Silvio Lattanzi · Sergei Vassilvitskii · Morteza Zadimoghaddam -
2020 Poster: Learning Augmented Energy Minimization via Speed Scaling »
Etienne Bamas · Andreas Maggiori · Lars Rohwedder · Ola Svensson -
2020 Poster: Fast and Accurate $k$-means++ via Rejection Sampling »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2020 Poster: Online MAP Inference of Determinantal Point Processes »
Aditya Bhaskara · Amin Karbasi · Silvio Lattanzi · Morteza Zadimoghaddam -
2020 Spotlight: Learning Augmented Energy Minimization via Speed Scaling »
Etienne Bamas · Andreas Maggiori · Lars Rohwedder · Ola Svensson -
2020 Poster: Exact Recovery of Mangled Clusters with Same-Cluster Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice -
2020 Poster: Fairness in Streaming Submodular Maximization: Algorithms and Hardness »
Marwa El Halabi · Slobodan Mitrović · Ashkan Norouzi-Fard · Jakab Tardos · Jakub Tarnawski -
2020 Oral: Exact Recovery of Mangled Clusters with Same-Cluster Queries »
Marco Bressan · Nicolò Cesa-Bianchi · Silvio Lattanzi · Andrea Paudice -
2020 Session: Orals & Spotlights Track 05: Clustering/Ranking »
Silvio Lattanzi · Katerina Fragkiadaki -
2019 : Coffee Break & Poster Session 1 »
Yan Zhang · Jonathon Hare · Adam Prugel-Bennett · Po Leung · Patrick Flaherty · Pitchaya Wiratchotisatian · Alessandro Epasto · Silvio Lattanzi · Sergei Vassilvitskii · Morteza Zadimoghaddam · Theja Tulabandhula · Fabian Fuchs · Adam Kosiorek · Ingmar Posner · William Hang · Anna Goldie · Sujith Ravi · Azalia Mirhoseini · Yuwen Xiong · Mengye Ren · Renjie Liao · Raquel Urtasun · Haici Zhang · Michele Borassi · Shengda Luo · Andrew Trapp · Geoffroy Dubourg-Felonneau · Yasmeen Kussad · Christopher Bender · Manzil Zaheer · Junier Oliva · Michał Stypułkowski · Maciej Zieba · Austin Dill · Chun-Liang Li · Songwei Ge · Eunsu Kang · Oiwi Parker Jones · Kelvin Ka Wing Wong · Joshua Payne · Yang Li · Azade Nazi · Erkut Erdem · Aykut Erdem · Kevin O'Connor · Juan J Garcia · Maciej Zamorski · Jan Chorowski · Deeksha Sinha · Harry Clifford · John W Cassidy -
2018 Poster: Mallows Models for Top-k Lists »
Flavio Chierichetti · Anirban Dasgupta · Shahrzad Haddadan · Ravi Kumar · Silvio Lattanzi -
2018 Poster: On Coresets for Logistic Regression »
Alexander Munteanu · Chris Schwiegelshohn · Christian Sohler · David Woodruff -
2018 Spotlight: On Coresets for Logistic Regression »
Alexander Munteanu · Chris Schwiegelshohn · Christian Sohler · David Woodruff -
2017 Poster: Hierarchical Clustering Beyond the Worst-Case »
Vincent Cohen-Addad · Varun Kanade · Frederik Mallmann-Trenn -
2017 Poster: Affinity Clustering: Hierarchical Clustering at Scale »
Mohammadhossein Bateni · Soheil Behnezhad · Mahsa Derakhshan · MohammadTaghi Hajiaghayi · Raimondas Kiveris · Silvio Lattanzi · Vahab Mirrokni -
2016 Poster: Community Detection on Evolving Graphs »
Stefano Leonardi · Aris Anagnostopoulos · Jakub Łącki · Silvio Lattanzi · Mohammad Mahdian -
2016 Poster: Linear Relaxations for Finding Diverse Elements in Metric Spaces »
Aditya Bhaskara · Mehrdad Ghadiri · Vahab Mirrokni · Ola Svensson -
2014 Poster: Distributed Balanced Clustering via Mapping Coresets »
Mohammadhossein Bateni · Aditya Bhaskara · Silvio Lattanzi · Vahab Mirrokni