Workshop: InterNLP: Workshop on Interactive Learning for Natural Language Processing
InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions
Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We rather argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We show that an interactive setup with users enabled to provide feedback can achieve a better and fair balance between task performance and bias mitigation, supported by faithful explanations.