Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

AutoDQ: Automatic Data Quality for Financial Data


Financial services companies depend on peta-bytes of data to make decisions about investments, services and operations. Data-centric methods are needed to ensure the quality of the data used for ML model-based and other business process automation. This paper presents AutoDQ, an end-to-end data quality assurance framework to monitor production data quality and which leverages ML to identify and select validation constraints. AutoDQ introduces novel unit tests derived from the automatic extraction of data semantics and inter-column relationships, in addition to constraints based on predictability and statistical profiling of data. It operates on both tabular and time-series data without requiring schema or any metadata. The components of our framework have been tested over 100 public datasets as well as several internal transactional datasets.