Timezone: »

tspDB: Time Series Predict DB
Anish Agarwal · Abdullah Alomar · Devavrat Shah

Tue Dec 08 06:20 AM -- 06:40 AM & Wed Dec 09 06:20 AM -- 06:40 AM (PST) @ None
Event URL: https://colab.research.google.com/drive/1yA3gMVB3XxKYgnSKx0O5dTWMElfX8J2S?usp=sharing »

An important goal in Systems for ML is to make ML broadly accessible. Arguably, the major bottleneck is not the lack of access to prediction algorithms, for which many excellent open-source ML libraries exist. Rather, it is the complex data engineering required to take data from a datastore or database (DB) into a particular work environment format (e.g. spark data-frame) so that a prediction algorithm can be trained, and to do so in a scalable manner. This is further exacerbated as ML algorithms are now trained on large volumes of data, yet we need predictions in real-time. This is especially true in a variety of time-series applications such as finance and real-time control systems.

Towards easing this bottleneck, we showcase tspDB – a system that enables predictive query functionality in any existing time-series relational DB (open-source available at tspDB.mit.edu). Specifically, tspDB enables two types of predictive queries for time series data: (i) imputing a missing/noisy observation for a data point we do observe; (ii) forecasting a data point in the future. In tspDB the ML workflow is entirely abstracted away from the user; instead a single interface to answer both a predictive query and a standard SQL SELECT query is exposed. Pleasingly, we find tspDB statistically outperforms industry standard deep-learning based time series methods (e.g. DeepAR, LSTM’s) on benchmark time series datasets; further, tspDB’s computational performance is close to the time it takes to just insert and read data from PostgreSQL, making it a real-time prediction system.

The demo itself will be run entirely through a Google Colab notebook that users can access through a browser and will require no software installation. The notebook will walk through how to use tspDB to make predictive SQL queries on retail, energy and financial data, and how to measure its computational performance with respect to standard SQL queries. A pre-recording of the entire demo will also be provided.

Author Information

Anish Agarwal (MIT)
Abdullah Alomar (Massachusetts Institute of Technology)
Devavrat Shah (Massachusetts Institute of Technology)

Devavrat Shah is a professor of Electrical Engineering & Computer Science and Director of Statistics and Data Science at MIT. He received PhD in Computer Science from Stanford. He received Erlang Prize from Applied Probability Society of INFORMS in 2010 and NeuIPS best paper award in 2008.

More from the Same Authors