Skip to yearly menu bar Skip to main content


Lightning Talk
in
Workshop: Data Centric AI

Towards a Shared Rubric for Dataset Annotation


Abstract:

When arranging for third-party data annotation, it can be hard to compare how well the competing providers apply best practices to create high-quality datasets. This leads to a ``race to the bottom,'' where competition based solely on price makes it hard for vendors to charge for high-quality annotation. We propose a voluntary rubric which can be used (a) as a scorecard to compare vendors' offerings, (b) to communicate our expectations of the vendors more clearly and consistently than today, (c) to justify the expense of choosing someone other than the lowest bidder, and (d) to encourage annotation providers to improve their practices.