Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

Debiasing Pre-Trained Sentence Encoders With WordDropouts on Fine-Tuning Data


Pre-trained neural language models are increasingly deployed for decision making in critical application domains. As demonstrated in previous work, these models often exhibit biases in the learned representations as well as in the model predictions. While pre-trained sentence embeddings achieve superior performance on downstream tasks, most existing debiasing approaches are only applicable at the word level. In this paper, we propose a word dropout approach for pre-processing the fine-tuning data for pre-trained sentence encoders such as BERT. The goal is to selectively attenuate the contribution from words which are highly correlated with words associated with societal biases. Extensive evaluations on relevant datasets demonstrate that our approach outperforms a state-of-the-art approach on two out of three downstream tasks in terms of mitigating bias while achieving high accuracy.