Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

A concept for fitness-for-use evaluation in Machine Learning pipelines


Data quality is central for Machine Learning (ML) applications but is in many cases not trivial to evaluate. Particular challenges involve e.g., validity concerns of quality metrics with regards to ML tasks and data provenance and problematic reproducibility of data quality assessments. In this paper we propose to intertwine all components of the ML pipeline into the quality assessment process to achieve a concept of fitness-for-use which has a clearly defined area of validity, is reproducible and can potentially be transferred to other ML pipelines.