Workshop: Data Centric AI

Evaluating Machine Learning Models for Internet Network Security with Data Slices


By using public data about the structure of the internet, practitioners can identify what assets organizations own on the internet, many of which are vulnerable to cybersecurity attacks. With current knowledge of what servers and assets are exposed to the internet, organizations are able to remediate vulnerabilities before they are exploited. As part of managing an AI/ML system for this "internet asset attribution" task, we make extensive formal use of "data slices", subsets of data that share particular properties. Data slices make managing models and datasets more repeatable and sustainable. Data slice evaluation lets us systematically manage regressions, user trust, experiment evaluation, and data space characterization.