NeurIPS Addressing Content Selection Bias in Creating Datasets for Hate Speech Detection

Lightning Talk
in
Workshop: Data Centric AI

Addressing Content Selection Bias in Creating Datasets for Hate Speech Detection

[ Abstract ]

Abstract:

A key challenge in building a dataset for hate speech detection is that hate speech is relatively rare, meaning that random sampling of tweets to annotate is highly inefficient in finding hate speech. To address this, prior work often only considers tweets matching known “hate words”, but restricting the dataset to a pre-defined vocabulary only partially captures the real-world phenomenon we seek to model. Our key insight is that the rarity of hate speech is akin to rarity of relevance in information retrieval (IR). This connection suggests that well-established methodologies for creating IR test collections can be usefully applied to build more inclusive datasets for hate speech. Applying this idea, we have created a new hate speech dataset for Twitter that provides broader coverage of hate, showing a drop in accuracy of existing detection models when tested on these broader forms of hate. This workshop short paper only highlights a longer work currently under review.

Lightning Talk in Workshop: Data Centric AI

Addressing Content Selection Bias in Creating Datasets for Hate Speech Detection

Lightning Talk
in
Workshop: Data Centric AI