Skip to yearly menu bar Skip to main content

Workshop: Trustworthy and Socially Responsible Machine Learning

Beyond Protected Attributes: Disciplined Detection of Systematic Deviations in Data

Adebayo Oshingbesan · Winslow Omondi · Girmaw Abebe Tadesse · Celia Cintas · Skyler D. Speakman


Finding systematic deviations of an outcome of interest in data and models is an important goal of trustworthy and socially responsible AI. To understand systematic deviations at a subgroup level, it is important to look beyond \emph{predefined} groups and consider all possible subgroups for analysis. Of course this exhaustive enumeration is not possible and there needs to be a balance of exploratory and confirmation analysis in socially-responsible AI. In this paper we compare recently proposed methods for detecting systematic deviations in an outcome of interest at the subgroup level across three socially-relevant data sets. Furthermore, we show the importance of looking through all possible subgroups for systematic deviations by comparing detected patterns using only protected attributes against patterns detected using the entire search space. One interesting pattern found in the OULAD dataset is that while having a high course load and not being from the highest socio-economic decile of UK regions makes students 2.3 times more likely to fail or withdraw from courses, being from Ireland or Wales mitigates this risk by 37%. This pattern may have been missed if we focused our analysis on the protected groups of gender and disability only. Python code for all methods, including the most recently proposed "AutoStrat" are available on open-sourced code repositories.

Chat is not available.