Timezone: »
Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dimensional signals over practically unbounded time. We consider the problem of computing optimal segmentation of such signals by k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal. The coreset enables fast further analysis such as automatic summarization and analysis of such signals. A coreset (core-set) is a compact representation of the data seen so far, which approximates the data well for a specific task -- in our case, segmentation of the stream. We show that, perhaps surprisingly, the segmentation problem admits coresets of cardinality only linear in the number of segments k, independently of both the dimension d of the signal, and its number n of points. More precisely, we construct a representation of size O(klog n /eps^2) that provides a (1+eps)-approximation for the sum of squared distances to any given k-piecewise linear function. Moreover, such coresets can be constructed in a parallel streaming approach. Our results rely on a novel eduction of statistical estimations to problems in computational geometry. We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and financial domains, using 255 machines in Amazon cloud.
Author Information
Guy Rosman (Massachusetts Institute of Technology)
Mikhail Volkov (Massachusetts Institute of Technology)
Dan Feldman (Massachusetts Institute of Technology)
John Fisher III (MIT)
Daniela Rus (Massachusetts Institute of Technology)
More from the Same Authors
-
2021 : Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks »
Ryan Sander · Wilko Schwarting · Tim Seyde · Igor Gilitschenski · Sertac Karaman · Daniela Rus -
2021 : Strength Through Diversity: Robust Behavior Learning via Mixture Policies »
Tim Seyde · Wilko Schwarting · Igor Gilitschenski · Markus Wulfmeier · Daniela Rus -
2022 : Posterior Consistency for Gaussian Process Surrogate Models with Generalized Observations »
Rujian Chen · John Fisher III -
2022 : PyHopper - A Plug-and-Play Hyperparameter Optimization Engine »
Mathias Lechner · Ramin Hasani · Sophie Neubauer · Philipp Neubauer · Daniela Rus -
2022 : Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality Gap »
Mathias Lechner · Ramin Hasani · Alexander Amini · Tsun-Hsuan Johnson Wang · Thomas Henzinger · Daniela Rus -
2022 : Infrastructure-based End-to-End Learning and Prevention of Driver Failure »
Noam Buckman · Shiva Sreeram · Mathias Lechner · Yutong Ban · Ramin Hasani · Sertac Karaman · Daniela Rus -
2022 : Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks »
Sadhana Lolla · Iaroslav Elistratov · Alejandro Perez · Elaheh Ahmadi · Daniela Rus · Alexander Amini -
2022 : Infrastructure-based End-to-End Learning and Prevention of Driver Failure »
Noam Buckman · Shiva Sreeram · Mathias Lechner · Yutong Ban · Ramin Hasani · Sertac Karaman · Daniela Rus -
2022 : Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks »
Sadhana Lolla · Iaroslav Elistratov · Alejandro Perez · Elaheh Ahmadi · Daniela Rus · Alexander Amini -
2023 Poster: On the Size and Approximation Error of Distilled Datasets »
Alaa Maalouf · Murad Tukan · Noel Loo · Ramin Hasani · Mathias Lechner · Daniela Rus -
2023 Poster: DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models »
Tsun-Hsuan Johnson Wang · Juntian Zheng · Pingchuan Ma · Yilun Du · Byungchul Kim · Andrew Spielberg · Josh Tenenbaum · Chuang Gan · Daniela Rus -
2023 Poster: Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning »
Mathias Lechner · lianhao yin · Tim Seyde · Tsun-Hsuan Johnson Wang · Wei Xiao · Ramin Hasani · Joshua Rountree · Daniela Rus -
2023 Oral: DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models »
Tsun-Hsuan Johnson Wang · Juntian Zheng · Pingchuan Ma · Yilun Du · Byungchul Kim · Andrew Spielberg · Josh Tenenbaum · Chuang Gan · Daniela Rus -
2022 Poster: Efficient Dataset Distillation using Random Feature Approximation »
Noel Loo · Ramin Hasani · Alexander Amini · Daniela Rus -
2022 Poster: Evolution of Neural Tangent Kernels under Benign and Adversarial Training »
Noel Loo · Ramin Hasani · Alexander Amini · Daniela Rus -
2022 Poster: ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment »
Joseph DelPreto · Chao Liu · Yiyue Luo · Michael Foshey · Yunzhu Li · Antonio Torralba · Wojciech Matusik · Daniela Rus -
2021 Poster: Sparse Flows: Pruning Continuous-depth Models »
Lucas Liebenwein · Ramin Hasani · Alexander Amini · Daniela Rus -
2021 Poster: Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition »
Lucas Liebenwein · Alaa Maalouf · Dan Feldman · Daniela Rus -
2021 Poster: Causal Navigation by Continuous-time Neural Networks »
Charles Vorbach · Ramin Hasani · Alexander Amini · Mathias Lechner · Daniela Rus -
2021 Poster: Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies »
Tim Seyde · Igor Gilitschenski · Wilko Schwarting · Bartolomeo Stellato · Martin Riedmiller · Markus Wulfmeier · Daniela Rus -
2020 Poster: Sequential Bayesian Experimental Design with Variable Cost Structure »
Sue Zheng · David Hayden · Jason Pacheco · John Fisher III -
2020 Poster: Deep Evidential Regression »
Alexander Amini · Wilko Schwarting · Ava P Soleimany · Daniela Rus -
2020 Poster: Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information »
Genevieve Flaspohler · Nick Roy · John Fisher III -
2019 Poster: Learning-In-The-Loop Optimization: End-To-End Control And Co-Design Of Soft Robots Through Learned Deep Latent Representations »
Andrew Spielberg · Allan Zhao · Yuanming Hu · Tao Du · Wojciech Matusik · Daniela Rus -
2016 Poster: Dimensionality Reduction of Massive Sparse Datasets Using Coresets »
Dan Feldman · Mikhail Volkov · Daniela Rus -
2015 Poster: Streaming, Distributed Variational Inference for Bayesian Nonparametrics »
Trevor Campbell · Julian Straub · John Fisher III · Jonathan How -
2015 Poster: Probabilistic Variational Bounds for Graphical Models »
Qiang Liu · John Fisher III · Alexander Ihler -
2014 Poster: Parallel Sampling of HDPs using Sub-Cluster Splits »
Jason Chang · John Fisher III -
2013 Poster: Parallel Sampling of DP Mixture Models using Sub-Cluster Splits »
Jason Chang · John Fisher III -
2012 Workshop: Bayesian Nonparametric Models For Reliable Planning And Decision-Making Under Uncertainty »
Jonathan How · Lawrence Carin · John Fisher III · Michael Jordan · Alborz Geramifard -
2012 Poster: Coupling Nonparametric Mixtures via Latent Dirichlet Processes »
Dahua Lin · John Fisher III -
2011 Oral: Scalable Training of Mixture Models via Coresets »
Dan Feldman · Matthew Faulkner · Andreas Krause -
2011 Poster: Scalable Training of Mixture Models via Coresets »
Dan Feldman · Matthew Faulkner · Andreas Krause -
2010 Oral: Construction of Dependent Dirichlet Processes based on Poisson Processes »
Dahua Lin · Eric Grimson · John Fisher III -
2010 Poster: Construction of Dependent Dirichlet Processes based on Poisson Processes »
Dahua Lin · Eric Grimson · John Fisher III