Skip to yearly menu bar Skip to main content

Lightning Talk
Workshop: Data Centric AI

A Data-Centric Image Classification Benchmark


High-quality labeled datasets are critical to the advances in machine learning and tend to benefit all kinds of model-centric algorithms, such as novel architectures and loss functions. The labeling process is usually label-intensive and time-consuming since it includes many turns of data selection, data cleaning, and data analysis. There are tons of work that aim to solve each specific step, but it lacks an understanding of how to combine them and, most importantly, a standard testbed for different dataset improving techniques. We, therefore, present the concept of a multi-domain benchmark for acquiring consistent labels with limited budgets. In contrast to most benchmarks that encourage novel model-centric algorithms, our multi-domain data-centric benchmark encourages algorithms to improve the provided dataset. The proposed benchmark consists of different resolutions, class distributions and domains ranging from biological to medical domains.