Timezone: »
Poster
Dimensionality Reduction of Massive Sparse Datasets Using Coresets
Dan Feldman · Mikhail Volkov · Daniela Rus
In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any $n\times d$ matrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the $n$ rows that approximates their sum of squared distances to \emph{every} $k$-dimensional \emph{affine} subspace. An open theoretical problem has been to compute such a coreset that is independent of both $n$ and $d$. An open practical problem has been to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix in a reasonable time. We answer both of these questions affirmatively. Our main technical result is a new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream.
Author Information
Dan Feldman (University of Haifa)
Mikhail Volkov (MIT)
Daniela Rus (MIT)
More from the Same Authors
-
2021 Spotlight: Coresets for Decision Trees of Signals »
Ibrahim Jubran · Ernesto Evgeniy Sanches Shayda · Ilan I Newman · Dan Feldman -
2021 : Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks »
Ryan Sander · Wilko Schwarting · Tim Seyde · Igor Gilitschenski · Sertac Karaman · Daniela Rus -
2021 : Strength Through Diversity: Robust Behavior Learning via Mixture Policies »
Tim Seyde · Wilko Schwarting · Igor Gilitschenski · Markus Wulfmeier · Daniela Rus -
2022 : PyHopper - A Plug-and-Play Hyperparameter Optimization Engine »
Mathias Lechner · Ramin Hasani · Sophie Neubauer · Philipp Neubauer · Daniela Rus -
2022 : Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality Gap »
Mathias Lechner · Ramin Hasani · Alexander Amini · Tsun-Hsuan Johnson Wang · Thomas Henzinger · Daniela Rus -
2022 : Infrastructure-based End-to-End Learning and Prevention of Driver Failure »
Noam Buckman · Shiva Sreeram · Mathias Lechner · Yutong Ban · Ramin Hasani · Sertac Karaman · Daniela Rus -
2022 : Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks »
Sadhana Lolla · Iaroslav Elistratov · Alejandro Perez · Elaheh Ahmadi · Daniela Rus · Alexander Amini -
2022 : Infrastructure-based End-to-End Learning and Prevention of Driver Failure »
Noam Buckman · Shiva Sreeram · Mathias Lechner · Yutong Ban · Ramin Hasani · Sertac Karaman · Daniela Rus -
2022 : Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks »
Sadhana Lolla · Iaroslav Elistratov · Alejandro Perez · Elaheh Ahmadi · Daniela Rus · Alexander Amini -
2022 Poster: Efficient Dataset Distillation using Random Feature Approximation »
Noel Loo · Ramin Hasani · Alexander Amini · Daniela Rus -
2022 Poster: Coreset for Line-Sets Clustering »
Sagi Lotan · Ernesto Evgeniy Sanches Shayda · Dan Feldman -
2022 Poster: Evolution of Neural Tangent Kernels under Benign and Adversarial Training »
Noel Loo · Ramin Hasani · Alexander Amini · Daniela Rus -
2022 Poster: ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment »
Joseph DelPreto · Chao Liu · Yiyue Luo · Michael Foshey · Yunzhu Li · Antonio Torralba · Wojciech Matusik · Daniela Rus -
2021 Poster: Sparse Flows: Pruning Continuous-depth Models »
Lucas Liebenwein · Ramin Hasani · Alexander Amini · Daniela Rus -
2021 Poster: Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition »
Lucas Liebenwein · Alaa Maalouf · Dan Feldman · Daniela Rus -
2021 Poster: Causal Navigation by Continuous-time Neural Networks »
Charles Vorbach · Ramin Hasani · Alexander Amini · Mathias Lechner · Daniela Rus -
2021 Poster: Coresets for Decision Trees of Signals »
Ibrahim Jubran · Ernesto Evgeniy Sanches Shayda · Ilan I Newman · Dan Feldman -
2021 Poster: Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies »
Tim Seyde · Igor Gilitschenski · Wilko Schwarting · Bartolomeo Stellato · Martin Riedmiller · Markus Wulfmeier · Daniela Rus -
2020 Poster: Deep Evidential Regression »
Alexander Amini · Wilko Schwarting · Ava P Soleimany · Daniela Rus -
2020 Poster: Coresets for Near-Convex Functions »
Murad Tukan · Alaa Maalouf · Dan Feldman -
2019 Poster: Fast and Accurate Least-Mean-Squares Solvers »
Ibrahim Jubran · Alaa Maalouf · Dan Feldman -
2019 Poster: Learning-In-The-Loop Optimization: End-To-End Control And Co-Design Of Soft Robots Through Learned Deep Latent Representations »
Andrew Spielberg · Allan Zhao · Yuanming Hu · Tao Du · Wojciech Matusik · Daniela Rus -
2019 Oral: Fast and Accurate Least-Mean-Squares Solvers »
Ibrahim Jubran · Alaa Maalouf · Dan Feldman -
2019 Poster: k-Means Clustering of Lines for Big Data »
Yair Marom · Dan Feldman -
2014 Poster: Coresets for k-Segmentation of Streaming Data »
Guy Rosman · Mikhail Volkov · Dan Feldman · John Fisher III · Daniela Rus