Skip to yearly menu bar Skip to main content

Workshop: Second Workshop on Efficient Natural Language and Speech Processing (ENLSP-II)

Can we get smarter than majority vote? Efficient use of individual rater’s labels for content moderation

Changho Shin · Alice Schoenauer-Sebag

Keywords: [ ENLSP-Main ]


A large number of natural language processing (NLP) datasets contain crowdsourced labels. Most of the time, training set labels are generated using majority vote from individual rater's labels, which discards a significant amount of information. This work focuses on improving data-efficiency when training a model for "marginally abusive" Tweet classification. We compare majority vote to two families of alternative methods, changing the training process in two different steps: (1) aggregating individual labels using weak supervision to improve the quality of labels for model training, and (2) predicting individual labels using the multi-rater models proposed by Davani et al. [2022]. We find that majority vote is a strong baseline. Dawid-Skene and multi-rater models perform well, although the latter tend to be more susceptible to overfit. Finally, we also identify a number of practical considerations for the practitioner, such as setting a minimum number of labels per rater, or preferring soft to hard labels.

Chat is not available.