Contributed Talk
Workshop: InterNLP: Workshop on Interactive Learning for Natural Language Processing

InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

Zexue He


Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We rather argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We show that an interactive setup with users enabled to provide feedback can achieve a better and fair balance between task performance and bias mitigation, supported by faithful explanations.

Chat is not available.