Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

Towards a Framework for Data Excellence in Data-Centric AI: Lessons from the Semantic Web


The Semantic Web community takes pride in developing technologies to make data significantly more valuable, interpretable, and computationally friendly by annotating the data appropriately with community-accepted vocabularies. For example, the core semantic web technology, the resource description framework, is designed to describe resources in a machine-readable way, and standardized ontologies, such as the recommended standard for capturing provenance on the web--PROV, were designed to provide lineage of data. This position paper argues that such semantic technologies could improve the quality of the data used in machine learning models, increase their accuracy, and make them more transparent and interpretable. We further argue that "a framework for excellence in data engineering" as put forth by the Data-centric AI workshop proposal (, has existed since the early 2000s, but there is a need for the Machine Learning and Semantic Web communities to collaborate to realize its full potential.