NeurIPS Expo Demonstration LLM-Powered Intelligent Data Engineering: From Workflow Design to Ingestion andQuality Assurance

Expo Demonstration

LLM-Powered Intelligent Data Engineering: From Workflow Design to Ingestion andQuality Assurance

Shashank Mujumdar

Upper Level Room 29A-D

[ Abstract ]

Tue 2 Dec noon PST — 3 p.m. PST

Abstract:

Modern enterprises depend on efficient data engineering pipelines to unlock value from diverse and large-scale datasets. Yet, current processes for workflow design, schema ingestion, and data quality validation remain complex, error-prone, and dependent on technical expertise. This creates barriers for non-expert users, slows down development, and introduces risks of data inconsistency.x000D x000D We present a suite of LLM-powered frameworks that reimagine enterprise data engineering across three critical dimensions: (i) From Natural Language to Executable ETL Flows, enabling intuitive pipeline creation with natural language specifications and automatic operator/property inference, (ii) All You Can Ingest, an end-to-end schema mapping and transformation framework that unifies semantic alignment, code synthesis, and robust validation, and (iii) Quality Assessment of Tabular Data, a scalable approach for auto-generating interpretable quality rules and executable validators tailored to specific datasets.x000D x000D Together, these innovations demonstrate how Large Language Models (LLMs), augmented with retrieval, code synthesis, reasoning, and guardrails, can transform the data engineering lifecycle into a more accessible, adaptive, and trustworthy process, reducing manual effort, accelerating time-to-value, and ensuring data fidelity at enterprise scale.

Chat is not available.