Timezone: »
Keywords: relational algebra, scikit pipelines, machine learning TL;DR: This paper proposes RASL, an open-source library of relational algebra (RA) operators for scikit-learn (SL). Abstract: Integrating data preparation with machine-learning (ML) pipelines has been a long- standing challenge. Prior work tried to solve it by building new data processing platforms such as MapReduce or Spark, and then implementing new libraries of ML algorithms for those. But despite the availability of these platforms, many ML practitioners continue to use scikit-learn instead, owing to its clean design and rich set of algorithms. Therefore, this paper proposes a different approach: instead of extending a data processing platform for ML, extend an ML library for data processing. Specifically, this paper proposes RASL, an open-source library of relational algebra (RA) operators for scikit-learn (SL). We illustrate RASL with a detailed case study involving joins and aggregation across multi-table input data. We hope our approach will lead to cleaner integration of data preparation with machine learning in practice.
Author Information
Chirag Sahni (Rensselaer Polytechnic Institute)
Kiran Kate (IBM Research)
Avi Shinnar (International Business Machines)
Thanh Lam Hoang (IBM Research)
Martin Hirzel (IBM Research AI)
More from the Same Authors
-
2021 : RASL: Relational Algebra in Scikit-Learn Pipelines »
Kiran Kate · Avi Shinnar · Thanh Lam Hoang · Martin Hirzel -
2022 : c-MBA: Adversarial Attack for Cooperative MARL Using Learned Dynamics Model »
Nhan H Pham · Lam Nguyen · Jie Chen · Thanh Lam Hoang · Subhro Das · Lily Weng -
2021 Poster: Pipeline Combinators for Gradual AutoML »
Guillaume Baudart · Martin Hirzel · Kiran Kate · Parikshit Ram · Avi Shinnar · Jason Tsay -
2021 Poster: Ensembling Graph Predictions for AMR Parsing »
Thanh Lam Hoang · Gabriele Picco · Yufang Hou · Young-Suk Lee · Lam Nguyen · Dzung Phan · Vanessa Lopez · Ramon Fernandez Astudillo -
2020 Expo Demonstration: Beyond AutoML: AI Automation & Scaling »
Lisa Amini · Nitin Gupta · Parikshit Ram · Kiran Kate · Bhanukiran Vinzamuri · Nathalie Baracaldo · Martin Korytak · Daniel K Weidele · Dakuo Wang -
2017 : Poster Session / Coffee Break »
Hongyu Ren · Sheng Lundquist · Steven Hickson · Abhimanyu Dubey · Saki Shinoda · Ana Marasović · Otilia Stretcu · Fitsum Reda · Vikas Raunak · Cicero dos Santos · Liane Canas · Jesus Mager Hois · Martin Hirzel