Timezone: »
Spotlight
On Coresets for Logistic Regression
Alexander Munteanu · Chris Schwiegelshohn · Christian Sohler · David Woodruff
Coresets are one of the central methods to facilitate the analysis of large data. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show the negative result that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $\mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $\mu(X)$ has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\eps)$-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression.
Author Information
Alexander Munteanu (TU Dortmund)
Chris Schwiegelshohn (Sapienza, University of Rome)
Christian Sohler (TU Dortmund)
David Woodruff (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: On Coresets for Logistic Regression »
Wed. Dec 5th through Thu the 6th Room Room 210 #77
More from the Same Authors
-
2021 Spotlight: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2022 Spotlight: Optimal Query Complexities for Dynamic Trace Estimation »
David Woodruff · Fred Zhang · Richard Zhang -
2022 Poster: Optimal Query Complexities for Dynamic Trace Estimation »
David Woodruff · Fred Zhang · Richard Zhang -
2021 Poster: Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces »
Vincent Cohen-Addad · David Saulpic · Chris Schwiegelshohn -
2021 Poster: Parallel and Efficient Hierarchical k-Median Clustering »
Vincent Cohen-Addad · Silvio Lattanzi · Ashkan Norouzi-Fard · Christian Sohler · Ola Svensson -
2019 Poster: Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels »
Michela Meister · Tamas Sarlos · David Woodruff -
2019 Poster: Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss »
Zhao Song · David Woodruff · Peilin Zhong -
2019 Poster: Efficient and Thrifty Voting by Any Means Necessary »
Debmalya Mandal · Ariel Procaccia · Nisarg Shah · David Woodruff -
2019 Poster: Regularized Weighted Low Rank Approximation »
Frank Ban · David Woodruff · Richard Zhang -
2019 Oral: Efficient and Thrifty Voting by Any Means Necessary »
Debmalya Mandal · Ariel Procaccia · Nisarg Shah · David Woodruff -
2019 Poster: Total Least Squares Regression in Input Sparsity Time »
Huaian Diao · Zhao Song · David Woodruff · Xin Yang -
2019 Poster: Optimal Sketching for Kronecker Product Regression and Low Rank Approximation »
Huaian Diao · Rajesh Jayaram · Zhao Song · Wen Sun · David Woodruff -
2019 Poster: Fully Dynamic Consistent Facility Location »
Vincent Cohen-Addad · Niklas Oskar D Hjuler · Nikos Parotsidis · David Saulpic · Chris Schwiegelshohn -
2019 Poster: Towards a Zero-One Law for Column Subset Selection »
Zhao Song · David Woodruff · Peilin Zhong -
2019 Poster: Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves »
Stefan Meintrup · Alexander Munteanu · Dennis Rohde -
2018 Poster: Robust Subspace Approximation in a Stream »
Roie Levin · Anish Prasad Sevekari · David Woodruff -
2018 Spotlight: Robust Subspace Approximation in a Stream »
Roie Levin · Anish Prasad Sevekari · David Woodruff -
2018 Poster: Sublinear Time Low-Rank Approximation of Distance Matrices »
Ainesh Bakshi · David Woodruff -
2018 Spotlight: Sublinear Time Low-Rank Approximation of Distance Matrices »
Ainesh Bakshi · David Woodruff