Timezone: »
Poster
Improved Coresets for Euclidean $k$-Means
Vincent Cohen-Addad · Kasper Green Larsen · David Saulpic · Chris Schwiegelshohn · Omar Ali Sheikh-Omar
Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. Euclidean $k$-median) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a $(1\pm \varepsilon)$ factor. The current state of the art coreset size is $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for Euclidean $k$-means and $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for Euclidean $k$-median. The best known lower bound for both problems is $\Omega(k\varepsilon^{-2})$. In this paper, we improve these bounds to $\tilde O(\min(k^{3/2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for Euclidean $k$-means and $\tilde O(\min(k^{4/3} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for Euclidean $k$-median. In particular, ours is the first provable bound that breaks through the $k^2$ barrier while retaining an optimal dependency on $\varepsilon$.
Author Information
Vincent Cohen-Addad (Google research)
Kasper Green Larsen (Aarhus University)
David Saulpic (Sorbonne Université-LIP6)
Chris Schwiegelshohn (Aarhus University)
Omar Ali Sheikh-Omar (Aarhus University)
More from the Same Authors
-
2021 Spotlight: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2022 : Scalable and Improved Algorithms for Individually Fair Clustering »
Mohammadhossein Bateni · Vincent Cohen-Addad · Alessandro Epasto · Silvio Lattanzi -
2022 Poster: Near-Optimal Correlation Clustering with Privacy »
Vincent Cohen-Addad · Chenglin Fan · Silvio Lattanzi · Slobodan Mitrovic · Ashkan Norouzi-Fard · Nikos Parotsidis · Jakub Tarnawski -
2022 Poster: Near-Optimal Private and Scalable $k$-Clustering »
Vincent Cohen-Addad · Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong -
2022 Poster: Optimal Weak to Strong Learning »
Kasper Green Larsen · Martin Ritzert -
2021 Poster: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2021 Poster: Parallel and Efficient Hierarchical k-Median Clustering »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2020 Poster: On the Power of Louvain in the Stochastic Block Model »
Vincent Cohen-Addad · Adrian Kosowski · Frederik Mallmann-Trenn · David Saulpic -
2020 Poster: Margins are Insufficient for Explaining Gradient Boosting »
Allan Grønlund · Lior Kamma · Kasper Green Larsen -
2019 Poster: Margin-Based Generalization Lower Bounds for Boosted Classifiers »
Allan Grønlund · Lior Kamma · Kasper Green Larsen · Alexander Mathiasen · Jelani Nelson -
2019 Poster: Fully Dynamic Consistent Facility Location »
Vincent Cohen-Addad · Niklas Oskar D Hjuler · Nikos Parotsidis · David Saulpic · Chris Schwiegelshohn -
2018 Poster: Fully Understanding The Hashing Trick »
Lior Kamma · Casper B. Freksen · Kasper Green Larsen -
2018 Spotlight: Fully Understanding The Hashing Trick »
Casper B. Freksen · Lior Kamma · Kasper Green Larsen -
2017 Poster: Hierarchical Clustering Beyond the Worst-Case »
Vincent Cohen-Addad · Varun Kanade · Frederik Mallmann-Trenn