Workshop
Algorithmic Fairness through the Lens of Time
Awa Dieng · Miriam Rateike · Golnoosh Farnadi · Ferdinando Fioretto · Jessica Schrouff
Room 252 - 254
Schedule
Fri 7:00 a.m. - 7:10 a.m.
|
Opening remarks
(
Opening remarks by organizers
)
>
SlidesLive Video |
🔗 |
Fri 7:10 a.m. - 7:40 a.m.
|
Invited talk 1: Richard Zemel: A Framework for Responsible Deployment of Large Language Models
(
invited talk
)
>
SlidesLive Video |
🔗 |
Fri 7:40 a.m. - 7:50 a.m.
|
Invited talk Q&A
(
Q&A
)
>
|
🔗 |
Fri 7:50 a.m. - 8:00 a.m.
|
Contributed Talk 1: Backtracking Counterfactual Fairness Lucius Bynum · Joshua Loftus · Julia Stoyanovichmore
(
Talk
)
>
SlidesLive Video |
🔗 |
Fri 8:00 a.m. - 8:05 a.m.
|
Contributed Talk 1 Q & A
(
Q&A
)
>
|
🔗 |
Fri 8:05 a.m. - 8:15 a.m.
|
Contributed Talk 2: Designing Long-term Group Fair Policies in Dynamical Systems Miriam Rateike · Isabel Valera · Patrick Forré
(
Talk
)
>
SlidesLive Video |
🔗 |
Fri 8:15 a.m. - 8:20 a.m.
|
Contributed Talk 2 Q&A
(
Q&A
)
>
|
🔗 |
Fri 8:20 a.m. - 9:00 a.m.
|
Coffee break and poster session 1
(
Poster session 1
)
>
|
🔗 |
Fri 9:00 a.m. - 9:30 a.m.
|
Invited Talk 2: Celestine Mendler-Dünner: Performativity and Power in Prediction
(
invited talk
)
>
SlidesLive Video |
🔗 |
Fri 9:30 a.m. - 9:45 a.m.
|
Invited talk Q&A
(
Q&A
)
>
|
🔗 |
Fri 9:45 a.m. - 11:00 a.m.
|
Roundtables
(
Roundtables
)
>
|
🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks
(
Spotlight
)
>
link
SlidesLive Video Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this work, we mathematically and empirically reveal an important limitation of attribute bias removal methods in presence of strong bias. Specifically, we derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength. We provide extensive experiments on synthetic, image, and census datasets to verify the theoretical bound and its consequences in practice. Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak, thus cautioning against the use of these methods in smaller datasets where strong attribute bias can occur, and advocating the need for methods that can overcome this limitation. |
Jiazhi Li · Mahyar Khayatkhoei · Jiageng Zhu · Hanchen Xie · Mohamed Hussein · Wael Abd-Almageed 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification
(
Spotlight
)
>
Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning. |
A. Feder Cooper · Katherine Lee · Madiha Choksi · Solon Barocas · Christopher De Sa · James Grimmelmann · Jon Kleinberg · Siddhartha Sen · Baobao Zhang 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Procedural Fairness Through Decoupling Objectionable Data Generating Components
(
Spotlight
)
>
link
SlidesLive Video We reveal and address the frequently overlooked yet important issue of disguised procedural unfairness, namely, the potentially inadvertent alterations on the behavior of neutral (i.e., not problematic) aspects of data generating process, and/or the lack of procedural assurance of the greatest benefit of the least advantaged individuals. Inspired by John Rawls's advocacy for pure procedural justice (Rawls, 1971; 2001), we view automated decision-making as a microcosm of social institutions, and consider how the data generating process itself can satisfy the requirements of procedural fairness. We propose a framework that decouples the objectionable data generating components from the neutral ones by utilizing reference points and the associated value instantiation rule. Our findings highlight the necessity of preventing disguised procedural unfairness, drawing attention not only to the objectionable data generating components that we aim to mitigate, but also more importantly, to the neutral components that we intend to keep unaffected. |
Zeyu Tang · Jialu Wang · Yang Liu · Peter Spirtes · Kun Zhang 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Exploring Predictive Arbitrariness as Unfairness via Predictive Multiplicity and Predictive Churn
(
Spotlight
)
>
SlidesLive Video For models to be fair, predictions should not be arbitrary. Predictions can be considered arbitrary if small perturbations in the training data or model specification result in changed decisions for some individuals.In this context, predictive multiplicity, or predictive variation over a set of near-optimal models, has been proposed as a key measure of arbitrariness.Separate from fairness research, another type of predictive inconsistency arises in the context of models that are continuously updated with new data.In this setting, the instability metric is predictive churn: expected prediction flips over two models trained consecutively. Interestingly, these streams of research and measures of predictive inconsistency have been studied largely independently, although sometimes conflated. In this paper, we review these notions and study their similarities and differences on real datasets. We find that they do in fact measure distinct notions of arbitrariness, that they are not immediately mitigated by using uncertainty-aware prediction methods, and that they both exhibit strong dependence on both data and model specification. |
Jamelle Watson-Daniels · Lance Strait · Mehadi Hassen · Amy Skerry-Ryan · Alexander D'Amour 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Improving Fairness-Accuracy tradeoff with few Test Samples under Covariate Shift
(
Spotlight
)
>
SlidesLive Video Covariate shift in the test data can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups in such settings is of paramount importance due to societal implications like criminal justice. We operate under the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards this problem, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines. |
Shreyas Havaldar · Jatin Chauhan · Karthikeyan Shanmugam · Jay Nandy · Aravindan Raghuveer 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Loss Modeling for Multi-Annotator Datasets
(
Spotlight
)
>
SlidesLive Video Accounting for the opinions of all annotators of a dataset is critical for fairness. However, when annotating large datasets, individual annotators will frequently provide thousands of ratings which can lead to fatigue. Additionally, these annotation processes can occur over multiple days which can lead to an inaccurate representation of an annotator's opinion over time. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, we demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data. |
Uthman Jinadu · Jesse Annan · Shanshan Wen · Yi Ding 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Measuring fairness of synthetic oversampling on credit datasets
(
Spotlight
)
>
SlidesLive Video Machine Learning models often face performance issues due to class imbalance, a common problem characterized by datasets that are biased towards a so called majority class. Oversampling the minority class through synthetic generators has become a popular solution for balancing data, giving rise to a lot of rebalancing techniques, like ADASYN and SMOTE. Practitioners usually lean on performance metrics in order to either refute or advocate for the adoption of some resampling method. However, considering the increasing ethical and legal demands for fair machine learning models, it is important to test the neutrality of these methods with respect to fairness. We conducted an investigation of the effects of oversampling on gender bias by analyzing statistical parity difference (SPD) and equal opportunity difference (EOD) obtained from four credit datasets. Similarly to performance, fairness impact caused by synthetic minority oversampling showed to be more significant for weak classifiers. Our results suggest that synthetic oversampling should be used with caution in order to avoid amplifying or even creating biased data. |
Decio Miranda Filho · Thalita Veronese · Marcos M. Raimundo 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Transparency Through the Lens of Recourse and Manipulation
(
Spotlight
)
>
SlidesLive Video Individuals often seek to reverse undesired outcomes in interactions with automated systems, such as loan denials, by modifying their features. These reversions can occur through either system-recommended actions, known as ``recourse'', or through manipulation actions such as misreporting feature values. Providing recourse can benefit users by enabling feature improvements (e.g., improving creditworthiness by paying off debt) and enhance the system's own utility (e.g., by creating more credit worthy individuals to whom the system can lend) However, providing recourse also increases the transparency of the decision rule and thus introduces opportunities for strategic individuals to better exploit the system; this is particularly true when groups of agents share information (e.g., sharing graduate school admission information on websites such as GradCafe). This natural tension will ultimately decide whether or not the system elects to provide recourse, this differs from current literature, which presumes the system's willingness to provide recourse without investigating the rationality of such readiness. To address this gap, we propose a framework through which the interplay of transparency, recourse, and manipulation can be investigated. Within this framework, we demonstrate that a rational system is frequently incentivized to provide only a small fraction of agents with recourse actions. We capture the social-cost of the system's hesitance to provide recourse and demonstrate that rotational behavior of the system results in a systemic decrease to population's total utility. Further, we find that this utility decrease can fall disproportional on sensitive groups within the population (such as those defined by race of gender). |
Yatong Chen · Andrew Estornell · Yevgeniy Vorobeychik · Yang Liu 🔗 |
Fri 11:00 a.m. - 11:03 a.m.
|
Variation of Gender Biases in Visual Recognition Models Before and After Finetuning
(
Spotlight
)
>
SlidesLive Video We introduce a framework to measure how biases change before and after fine-tuning a large scale visual recognition model for a downstream task. Deep learning models trained on increasing amounts of data are known to encode societal biases. Many computer vision systems today rely on models typically pretrained on large scale datasets. While bias mitigation techniques have been developed for tuning models for downstream tasks, it is currently unclear what are the effects of biases already encoded in a pretrained model. Our framework incorporates sets of canonical images representing individual and pairs of concepts to highlight changes in biases for an array of off-the-shelf pretrained models across model sizes, dataset sizes, and training objectives. Through our analyses, we find that (1) supervised models trained on datasets such as ImageNet-21k are more likely to retain their pretraining biases regardless of the target dataset compared to self-supervised models. We also find that (2) models finetuned on larger scale datasets are more likely to introduce new biased associations. Our results also suggest that (3) biases can transfer to finetuned models and the finetuning objective and dataset can impact the extent of transferred biases. |
Jaspreet Ranjit · Tianlu Wang · Baishakhi Ray · Vicente Ordonez 🔗 |
Fri 11:00 a.m. - 12:00 p.m.
|
Lunch break
(
break
)
>
|
🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
On Comparing Fair classifiers under Data Bias
(
Spotlight
)
>
link
SlidesLive Video In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum \& Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases; 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments. |
mohit sharma · Amit Deshpande · Rajiv Ratn Shah 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Reevaluating COMPAS: Base Rate Tracking and Racial Bias
(
Spotlight
)
>
COMPAS is a controversial Recidivism Assessment Instrument (RAI) that has been used in the US criminal justice system to predict recidivism in pretrial settings. Angwin et al. (2016) argued that COMPAS is biased against Blacks because it violates a fairness criterion known as equalized odds. However, COMPAS satisfies another two prominent fairness criteria known as weak calibration and predictive parity, which are known to be inconsistent with equalized odds in most realistic settings. Eva (2022) argues that weak calibration is not sufficient for algorithmic fairness and claims that a different criterion, base rate tracking, is at least a necessary condition.In this paper, we present four different natural ways of measuring how badly COMPAS violates base rate tracking, i.e. how much the average predicted risk scores across ethnic groups deviate from their actual recidivism prevalence. We find significant deviations in all cases and argue that advocates of base rate tracking do indeed have good reason to be concerned about racial bias in COMPAS. Our interdisciplinary work concludes by raising some further normative questions that remain unanswered by our analysis. |
Victor Crespo · Javier Rando · Benjamin Eva · Vijay Keswani · Walter Sinnott-Armstrong 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Performativity and Prospective Fairness.
(
Spotlight
)
>
SlidesLive Video Deploying an algorithmically informed policy is a significant intervention in the structure of society. As is increasingly acknowledged, predictive algorithms have performative effects: using them can shift the distribution of social outcomes away from the one on which the algorithms were trained. Algorithmic fairness research is usually motivated by the worry that these performative effects will exacerbate the structural inequalities that gave rise to the training data. However, standard retrospective fairness methodologies are ill-suited to predict these effects. They impose static fairness constraints that hold after the predictive algorithm is trained, but before it is deployed and, therefore, before performative effects have had a chance to kick in. However, satisfying static fairness criteria after training is not sufficient to avoid exacerbating inequality after deployment. Addressing the fundamental worry that motivates algorithmic fairness requires explicitly comparing the change in relevant structural inequalities before and after deployment. We propose a prospective methodology for estimating this post-deployment change from pre-deployment data and knowledge about the algorithmic policy. That requires a strategy for distinguishing between, and accounting for, different kinds of performative effects. In this paper, we focus on the algorithmic effect on the causally downstream outcome variable. Throughout, we are guided by an application from public administration: the use of algorithms to (1) predict who among the recently unemployed will stay unemployed for the long term and (2) targeting them with labor market programs. We illustrate our proposal by showing how to predict whether such policies will exacerbate gender inequalities in the labor market. |
Sebastian Zezulka · Konstantin Genin 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Explaining knock-on effects of bias mitigation
(
Spotlight
)
>
SlidesLive Video In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups. Bias mitigation methods work in different ways and have known "waterfall" effects, e.g., mitigating bias at one place may manifest bias elsewhere. In this paper, we aim to characterise impacted cohorts when mitigation interventions are applied. To do so, we treat intervention effects as a classification task and learn an explainable meta-classifier to identify cohorts that have altered outcomes. We examine a range of bias mitigation strategies that work at various stages of the model life cycle. We empirically demonstrate that our meta-classifier performs well in uncovering impacted cohorts. Further, we show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e. people who receive unfavourable outcomes solely on account of mitigation efforts. This is despite improvement in fairness metrics. We use these results as a basis to argue for more careful audits of static mitigation interventions that go beyond aggregate metrics. |
Svetoslav Nizhnichenkov · Rahul Nair · Elizabeth Daly · Brian Mac Namee 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Addressing The Cost Of Fairness In A Data Market Over Time
(
Spotlight
)
>
It is well understood that the data generation process is a critical factor that shapes the fairness of a machine-learning system.Since data generation is often mediated by a data market, we ask whether machine-learning fairness can be addressed in data markets as they evolve and, if so, at what cost.We revisit a well-known model of a data market in which data are allocated by a centralized marketplace.If the marketplace decides to enforce fairness, the main question is whether the natural extraction of value from data under a fairness intervention is further constrained and who is affected by it.In a natural class of allocation functions and under mild conditions, we show that no agent in the data market asymptotically loses utility as the market expands to include more buyers---even if the cost of data production is inherently biased against individuals of a particular group.Our initial results suggest that, under certain conditions, the evolution of a system may be a useful tool to address the cost of fairness. |
Augustin Chaintreau · Roland Maio · Juba Ziani 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
On Mitigating Unconscious Bias through Bandits with Evolving Biased Feedback
(
Spotlight
)
>
SlidesLive Video
Media stereotypes, cultural stereotypes, and affinity bias are some of the driving factors shaping our unconscious biases. As the demographic landscape of the workforce evolves, this bias is subject to change, and in particular could be erased or inverted (e.g. computer programming was considered a ``woman's job'' in the US in the 1940s). To study this feedback loop between workforce demographics and bias, we introduce a multi-armed bandit model for which we only perceive a time-dependent biased reward, which is a function of the (evolving) fraction of times we picked each arm. We show that if we ignore the bias, UCB incurs linear regret in this setting. By contrast, when the bias model is exactly known, then an elimination-style algorithm achieves a regret at most $K^2$ times larger than in the standard, unbiased bandit setting. Moreover, we show that this regret scaling is (essentially) unimprovable by deriving a new instance-dependent regret lower bound which is roughly $K^2$ times larger than in the standard bandit setting, even in the setting where the policy knows the bias model exactly. To obtain this lower bound when the observed reward distributions are (i) time-varying and (ii) dependent on the policy's past actions, we develop new proof techniques beyond the standard bandit lower bound arguments, which may be of independent interest. In particular, we identify a ``bottleneck'' set of actions for which any policy must either (a) play many times, or (b) observe significantly biased samples. Then, using a stopped version of the divergence decomposition, we carefully construct a stopping time which allows us to translate cases (a) and (b) into an amplified lower bound.
|
Matthew Faw · Constantine Caramanis · Sanjay Shakkottai · Jessica Hoffmann 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness
(
Spotlight
)
>
SlidesLive Video Over the last few years the importance of algorithmic fairness in machine learning has gathered more and more traction and developed into a flourishing field of study. However, there still exists a gap between theoretic research on algorithmic fairness and its implementation in practice. Here, we show the importance of addressing this gap by demonstrating how algorithmic fairness heavily depends on the decisions made during a systems’ design and implementation, as biases in data can be mitigated or reinforced along the typical modeling pipeline. We present a framework that can aide in the design of robust real-world applications and help to inform the future study of algorithmic fairness.Drawing on insights from the field of psychology, we introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their implications on fairness. By combining decisions, we create a grid of all possible “universes” of decision combinations. Using the resulting dataset, we can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health care coverage of vulnerable populations for potential interventions. |
Jan Simson · Florian Pfisterer · Christoph Kern 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Fairer and More Accurate Models Through NAS
(
Spotlight
)
>
SlidesLive Video Making models algorithmically fairer in tabular data has been long studied, with techniques typically oriented towards fixes which usually take a neural model with an undesirable outcome and make changes to how the data are ingested, what the model weights are, or how outputs are processed. We employ an emergent and different strategy where we consider updating the model's architecture and training hyperparameters to find an entirely new model with better outcomes from the beginning of the debiasing procedure. In this work, we propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data. We conduct extensive exploration of architectural and hyperparameter spaces (MLP, ResNet, and FT-Transformer) across diverse datasets, demonstrating the dependence of accuracy and fairness metrics of model predictions on hyperparameter combinations. We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns. We propose a novel approach that jointly optimizes architectural and training hyperparameters in a multi-objective constraint of both accuracy and fairness. We produce architectures that consistently Pareto dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both, all of this while being Pareto-optimal over hyperparameters achieved through single-objective (accuracy) optimization runs. This research underscores the promise of automating fairness and accuracy optimization in deep learning models. |
Richeek Das · Samuel Dooley 🔗 |
Fri 11:03 a.m. - 11:06 a.m.
|
Causal Dependence Plots
(
Spotlight
)
>
SlidesLive Video Explaining artificial intelligence or machine learning models is increasingly important. To use such data-driven systems wisely we must understand how they interact with the world, including how they depend causally on data inputs. In this work we develop Causal Dependence Plots (CDPs) to visualize how one variable---a predicted outcome---depends on changes in another variable---a predictor---along with consequent causal changes in other predictor variables. Crucially, this may differ from standard methods based on holding other predictors constant or assuming they are independent, such as regression coefficients or Partial Dependence Plots (PDPs). CDPs use an auxiliary causal model to produce explanations because causal conclusions require causal assumptions. Our explanatory framework generalizes PDPs, including them as a special case, and enables a variety of other custom interpretive plots to show, for example, the total, direct, and indirect effects of causal mediation. We demonstrate with simulations and real data experiments how CDPs can be combined in a modular way with methods for causal learning or sensitivity analysis. Since people often think causally about input-output dependence, CDPs can be powerful tools in the xAI or interpretable machine learning toolkit and contribute to applications like scientific machine learning and algorithmic fairness. |
Joshua Loftus · Lucius Bynum · Sakina Hansen 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Fairness in link analysis ranking algorithms
(
Spotlight
)
>
SlidesLive Video In this paper, we study the problem of fairness in link analysis algorithms in evolving networks. In particular, we formally show that minority groups can get under-represented in ranking algorithms such as HITS and Pagerank, in networks that evolve over time. We show that under-representation does not come out of nowhere, but biased networks can create even more biased rankings: we use an evolving network model with multiple communities to show that homophily plays a central role in amplifying bias against minority groups in rankings based on HITS. We derive a theoretical approximation to show that bias increases in more homophilic networks, showing that the authority scores resulting from applying the HITS algorithm effectively push minorities even further down in the ranking as compared to the degree ranking. The use of evolving networks is particularly important in two ways: (1) to show that such algorithms do not get deployed on static content, but on ever-evolving nodes and links that have a temporal aspect; (2) the scores that link analysis algorithms output are often used as features in learning-to-rank algorithms, implying that biased features will have a lasting effect on the fairness of many ranking schemes. We illustrate our theoretical analysis on both synthetic and real datasets. |
Ana-Andreea Stoica · Augustin Chaintreau · Nelly Litvak 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
A Causal Perspective on Label Bias
(
Spotlight
)
>
SlidesLive Video A common setting for algorithmic decision making relies on the use of a prediction of a proxy label to decide on a specific course of action or to make a downstream decision, such as the enrollment of a patient in a care management program based on prediction of their expected healthcare expenditure. Proxy labels are used because the true label of interest may be difficult or impossible to measure in practice. However, the use of a proxy label may propagate equity-related harms when the relationship between the unmeasured true label and the proxy label differs across subgroups (e.g. by |
Vishwali Mhasawade · Alexander D'Amour · Stephen Pfohl 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Remembering to Be Fair: On Non-Markovian Fairness in Sequential Decision Making
(
Spotlight
)
>
SlidesLive Video Fair decision making has largely been studied with respect to a single decision. In this paper we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of a decision. In this setting, we observe that fairness often depends on the history of the sequential decision making process and not just on the current state. To advance our understanding of this class of fairness problems, we define the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term fairness and anytime fairness. We further explore the interplay between non-Markovian fairness and memory, and how this can support construction of fair policies in sequential decision-making settings. |
Parand A. Alamdari · Toryn Klassen · Elliot Creager · Sheila McIlraith 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
FAIR-Ensemble: Homogeneous Deep Ensembling Naturally Attenuates Disparate Group Performances
(
Spotlight
)
>
SlidesLive Video Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble --all the individual DNNs share the same training set, architecture, and design choices-- the minority group performance disproportionately improves with the number of models compared to the majority group, i.e. fairness naturally emerges from ensembling. Even more surprising, we find that this gain keeps occurring even when a large number of models is considered, e.g. 20, despite the fact that the average performance of the ensemble plateaus with fewer models. Our work establishes that simple DNN ensembles can be a powerful tool for alleviating disparate impact from DNN classifiers, thus curbing algorithmic harm. We also explore why this is the case. We find that even in homogeneous ensembles, varying the sources of stochasticity through parameter initialization, mini-batch sampling, and data-augmentation realizations, results in different fairness outcomes. |
Wei-Yin Ko · Daniel Dsouza · Karina Nguyen · Randall Balestriero · Sara Hooker 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Fair Clustering: Critique and Future Directions
(
Spotlight
)
>
Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature in fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering. |
John Dickerson · Seyed Esmaeili · Jamie Morgenstern · Claire Jie Zhang 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Seller-side Outcome Fairness in Online Marketplaces
(
Spotlight
)
>
This paper aims to investigate and address the issue of seller-side fairness within online marketplaces, where many sellers and their items are not sufficiently exposed to customers in an e-commerce platform. This phenomenon raises concerns regarding the potential loss of revenue associated with less exposed items as well as less marketplace diversity. We introduce the notion of seller-side outcome fairness and build an optimization model to balance collected recommendation rewards and the fairness measure. We then propose a gradient-based data-driven algorithm based on the duality and bandit theory. Our numerical experiments on real e-commerce data sets show that our algorithm can lift seller fairness measures while not hurting metrics like collected Gross Merchandise Value (GMV) and CTR. |
Zikun Ye · Reza Yousefi Maragheh · Lalitesh Morishetti · Shanu Vashishtha · Jason Cho · Kaushiki Nag · Sushant Kumar · Kannan Achan 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Mitigating stereotypical biases in text to image generative systems
(
Spotlight
)
>
SlidesLive Video
State-of-the-art generative text-to-image models are known to exhibit social biases and over-represent certain groups like people of perceived lighter skin tones and men in their outcomes. In this work, we propose a method to mitigate such biases and ensure that the outcomes are fair across different groups of people. We do this by fine tuning text-to-image models on synthetic data that varies in perceived skin tones and genders constructed from diverse text prompts. These text prompts are constructed from multiplicative combinations of ethnicities, genders, professions, age groups, and so on, resulting in diverse synthetic data. Our diversity fine tuned (DFT) model improves the group fairness metric by $150%$ for perceived skin tone and $97.7%$ for perceived gender. Compared to baselines, DFT models generate more people with perceived darker skin tone and more women. To foster open research, we will release all text prompts and code to generate training images.
|
Piero Esposito · Parmida Atighehchian · Anastasis Germanidis · Deepti Ghadiyaram 🔗 |
Fri 11:06 a.m. - 11:09 a.m.
|
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework
(
Spotlight
)
>
While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation).This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Rényi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
|
Sina Baharlouei · Meisam Razaviyayn 🔗 |
Fri 11:10 a.m. - 11:20 a.m.
|
Designing Long-term Group Fair Policies in Dynamical Systems
(
Oral
)
>
SlidesLive Video Neglecting the effect that decisions have on individuals (and thus, on the underlying data distribution) when designing algorithmic decision-making policies may increase inequalities and unfairness in the long term—even if fairness considerations were taken in the policy design process. In this paper, we propose a novel framework for achieving long-term group fairness in dynamical systems, in which current decisions may affect an individual’s features in the next step, and thus, future decisions. Specifically, our framework allows us to identify a time-independent policy that converges, if deployed, to the targeted fair stationary state of the system in the long term, independently of the initial data distribution. We model the system dynamics with a time-homogeneous Markov chain and optimize the policy leveraging the Markov chain convergence theorem to ensure unique convergence. We provide examples of different targeted fair states of the system, encompassing a range of long-term goals for society and policy makers. Furthermore, we show how our approach facilitates the evaluation of different long-term targets by examining their impact on the group-conditional population distribution in the long term and how it evolves until convergence. |
Miriam Rateike · Isabel Valera · Patrick Forré 🔗 |
Fri 11:20 a.m. - 11:30 a.m.
|
Backtracking Counterfactual Fairness
(
Oral
)
>
SlidesLive Video In this work, we introduce backtracking counterfactual fairness, a novel definition of counterfactual fairness that uses backtracking rather than interventional counterfactuals. This definition captures the following intuition: would changing your predicted outcome place an undue burden on you? Our definition is compatible with different normative choices about what constitutes an undue burden. Backtracking counterfactuals, unlike interventional counterfactuals, consider counterfactual worlds in which the causal mechanisms remain unchanged. This allows backtracking counterfactual fairness to avoid one of the key sociological and normative tensions running through other counterfactual-based fairness notions: modularity. We demonstrate how our proposal relates to other notions of fairness and fair recourse on both real and simulated data, suggesting a novel way to make use of causal information for more equitable decision making and a possible path to considering counterfactual-based fairness notions even in the presence of non-modular variables. |
Lucius Bynum · Joshua Loftus · Julia Stoyanovich 🔗 |
Fri 11:30 a.m. - 11:40 a.m.
|
Learning in reverse causal strategic environments with ramifications on two sided markets
(
Oral
)
>
SlidesLive Video Motivated by equilibrium models of labor markets, we develop a formulation of causal strategic classification in which strategic agents can directly manipulate their outcomes. As an application, we consider employers that seeks to anticipate the strategic response of labor force when developing a hiring policy. We show theoretically that such performative (optimal) hiring policies improves employer and labor force welfare (compared to employers that do not anticipate the strategic labor force response) in the classic Coate-Loury labor market model. Empirically, we show that these desirable properties of performative hiring policies do generalize to our own formulation of a general equilibrium labor market. On the other hand, we observe that in our formulation a performative firm both harms workers by reducing their aggregate utility and fails to prevent discrimination when more sophisticated wage and cost structures are introduced. |
Seamus Somerstep · Yuekai Sun · Ya'acov Ritov 🔗 |
Fri 11:40 a.m. - 11:50 a.m.
|
Repairing Regressors for Fair Binary Classification at Any Decision Threshold
(
Oral
)
>
SlidesLive Video We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity, thereby attaining common notions of group fairness like Equalized Odds or Equal Opportunity at all thresholds. We demonstrate on two fairness benchmarks that our technique works well empirically , while also outperforming and generalizing similar techniques from related work. |
Kweku Kwegyir-Aggrey · Jessica Dai · A. Feder Cooper · John Dickerson · Suresh Venkatasubramanian · Keegan Hines 🔗 |
Fri 12:00 p.m. - 12:30 p.m.
|
Invited Talk 3: Kun Zhang: At the Intersection of Algorithmic Fairness and Causal Representation Learning
(
invited talk
)
>
SlidesLive Video |
🔗 |
Fri 12:30 p.m. - 12:45 p.m.
|
Invited talk Q&A
(
Q&A
)
>
|
🔗 |
Fri 12:45 p.m. - 12:55 p.m.
|
Contributed Talk 3: Learning in reverse causal strategic environments with ramifications on two sided markets Seamus Somerstep · Yuekai Sun · Ya'acov Ritov
(
Talk
)
>
SlidesLive Video |
🔗 |
Fri 12:55 p.m. - 1:00 p.m.
|
Contributed talk 3 Q&A
(
Q&A
)
>
|
🔗 |
Fri 1:00 p.m. - 1:10 p.m.
|
Contributed Talk 4: Repairing Regressors for Fair Binary Classification at Any Decision Threshold Kweku Kwegyir-Aggrey · Jessica Dai · A. Feder Cooper · John Dickerson · Suresh Venkatasubramanian · Keegan Hines
(
Talk
)
>
SlidesLive Video |
🔗 |
Fri 1:10 p.m. - 1:15 p.m.
|
Contributed Talk 4 Q&A
(
Q&A
)
>
|
🔗 |
Fri 1:20 p.m. - 1:50 p.m.
|
Invited Talk 4: Ioana Baldini: Uncovering Hidden Bias: Auditing Language Models with a Social Stigma Lens
(
invited talk
)
>
SlidesLive Video |
🔗 |
Fri 1:50 p.m. - 2:00 p.m.
|
Invited talk Q&A
(
Q&A
)
>
|
🔗 |
Fri 2:00 p.m. - 2:40 p.m.
|
Panel: Kun Zhang, Ioana Baldini, Baobao Zhang, Tom Goldstein, Yacine Jernite
(
Panel
)
>
SlidesLive Video |
🔗 |
Fri 2:40 p.m. - 2:50 p.m.
|
Closing remarks
(
Closing remarks
)
>
SlidesLive Video |
🔗 |
Fri 2:50 p.m. - 3:30 p.m.
|
Poster session 2
(
Poster
)
>
|
🔗 |
-
|
Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks
(
Poster
)
>
Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this work, we mathematically and empirically reveal an important limitation of attribute bias removal methods in presence of strong bias. Specifically, we derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength. We provide extensive experiments on synthetic, image, and census datasets to verify the theoretical bound and its consequences in practice. Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak, thus cautioning against the use of these methods in smaller datasets where strong attribute bias can occur, and advocating the need for methods that can overcome this limitation. |
Jiazhi Li · Mahyar Khayatkhoei · Jiageng Zhu · Hanchen Xie · Mohamed Hussein · Wael Abd-Almageed 🔗 |
-
|
Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification
(
Poster
)
>
Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning. |
A. Feder Cooper · Katherine Lee · Madiha Choksi · Solon Barocas · Christopher De Sa · James Grimmelmann · Jon Kleinberg · Siddhartha Sen · Baobao Zhang 🔗 |
-
|
Procedural Fairness Through Decoupling Objectionable Data Generating Components
(
Poster
)
>
We reveal and address the frequently overlooked yet important issue of disguised procedural unfairness, namely, the potentially inadvertent alterations on the behavior of neutral (i.e., not problematic) aspects of data generating process, and/or the lack of procedural assurance of the greatest benefit of the least advantaged individuals. Inspired by John Rawls's advocacy for pure procedural justice (Rawls, 1971; 2001), we view automated decision-making as a microcosm of social institutions, and consider how the data generating process itself can satisfy the requirements of procedural fairness. We propose a framework that decouples the objectionable data generating components from the neutral ones by utilizing reference points and the associated value instantiation rule. Our findings highlight the necessity of preventing disguised procedural unfairness, drawing attention not only to the objectionable data generating components that we aim to mitigate, but also more importantly, to the neutral components that we intend to keep unaffected. |
Zeyu Tang · Jialu Wang · Yang Liu · Peter Spirtes · Kun Zhang 🔗 |
-
|
Exploring Predictive Arbitrariness as Unfairness via Predictive Multiplicity and Predictive Churn
(
Poster
)
>
For models to be fair, predictions should not be arbitrary. Predictions can be considered arbitrary if small perturbations in the training data or model specification result in changed decisions for some individuals.In this context, predictive multiplicity, or predictive variation over a set of near-optimal models, has been proposed as a key measure of arbitrariness.Separate from fairness research, another type of predictive inconsistency arises in the context of models that are continuously updated with new data.In this setting, the instability metric is predictive churn: expected prediction flips over two models trained consecutively. Interestingly, these streams of research and measures of predictive inconsistency have been studied largely independently, although sometimes conflated. In this paper, we review these notions and study their similarities and differences on real datasets. We find that they do in fact measure distinct notions of arbitrariness, that they are not immediately mitigated by using uncertainty-aware prediction methods, and that they both exhibit strong dependence on both data and model specification. |
Jamelle Watson-Daniels · Lance Strait · Mehadi Hassen · Amy Skerry-Ryan · Alexander D'Amour 🔗 |
-
|
Improving Fairness-Accuracy tradeoff with few Test Samples under Covariate Shift
(
Poster
)
>
Covariate shift in the test data can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups in such settings is of paramount importance due to societal implications like criminal justice. We operate under the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards this problem, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines. |
Shreyas Havaldar · Jatin Chauhan · Karthikeyan Shanmugam · Jay Nandy · Aravindan Raghuveer 🔗 |
-
|
Loss Modeling for Multi-Annotator Datasets
(
Poster
)
>
Accounting for the opinions of all annotators of a dataset is critical for fairness. However, when annotating large datasets, individual annotators will frequently provide thousands of ratings which can lead to fatigue. Additionally, these annotation processes can occur over multiple days which can lead to an inaccurate representation of an annotator's opinion over time. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, we demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data. |
Uthman Jinadu · Jesse Annan · Shanshan Wen · Yi Ding 🔗 |
-
|
Measuring fairness of synthetic oversampling on credit datasets
(
Poster
)
>
Machine Learning models often face performance issues due to class imbalance, a common problem characterized by datasets that are biased towards a so called majority class. Oversampling the minority class through synthetic generators has become a popular solution for balancing data, giving rise to a lot of rebalancing techniques, like ADASYN and SMOTE. Practitioners usually lean on performance metrics in order to either refute or advocate for the adoption of some resampling method. However, considering the increasing ethical and legal demands for fair machine learning models, it is important to test the neutrality of these methods with respect to fairness. We conducted an investigation of the effects of oversampling on gender bias by analyzing statistical parity difference (SPD) and equal opportunity difference (EOD) obtained from four credit datasets. Similarly to performance, fairness impact caused by synthetic minority oversampling showed to be more significant for weak classifiers. Our results suggest that synthetic oversampling should be used with caution in order to avoid amplifying or even creating biased data. |
Decio Miranda Filho · Thalita Veronese · Marcos M. Raimundo 🔗 |
-
|
Transparency Through the Lens of Recourse and Manipulation
(
Poster
)
>
Individuals often seek to reverse undesired outcomes in interactions with automated systems, such as loan denials, by modifying their features. These reversions can occur through either system-recommended actions, known as ``recourse'', or through manipulation actions such as misreporting feature values. Providing recourse can benefit users by enabling feature improvements (e.g., improving creditworthiness by paying off debt) and enhance the system's own utility (e.g., by creating more credit worthy individuals to whom the system can lend) However, providing recourse also increases the transparency of the decision rule and thus introduces opportunities for strategic individuals to better exploit the system; this is particularly true when groups of agents share information (e.g., sharing graduate school admission information on websites such as GradCafe). This natural tension will ultimately decide whether or not the system elects to provide recourse, this differs from current literature, which presumes the system's willingness to provide recourse without investigating the rationality of such readiness. To address this gap, we propose a framework through which the interplay of transparency, recourse, and manipulation can be investigated. Within this framework, we demonstrate that a rational system is frequently incentivized to provide only a small fraction of agents with recourse actions. We capture the social-cost of the system's hesitance to provide recourse and demonstrate that rotational behavior of the system results in a systemic decrease to population's total utility. Further, we find that this utility decrease can fall disproportional on sensitive groups within the population (such as those defined by race of gender). |
Yatong Chen · Andrew Estornell · Yevgeniy Vorobeychik · Yang Liu 🔗 |
-
|
Variation of Gender Biases in Visual Recognition Models Before and After Finetuning
(
Poster
)
>
We introduce a framework to measure how biases change before and after fine-tuning a large scale visual recognition model for a downstream task. Deep learning models trained on increasing amounts of data are known to encode societal biases. Many computer vision systems today rely on models typically pretrained on large scale datasets. While bias mitigation techniques have been developed for tuning models for downstream tasks, it is currently unclear what are the effects of biases already encoded in a pretrained model. Our framework incorporates sets of canonical images representing individual and pairs of concepts to highlight changes in biases for an array of off-the-shelf pretrained models across model sizes, dataset sizes, and training objectives. Through our analyses, we find that (1) supervised models trained on datasets such as ImageNet-21k are more likely to retain their pretraining biases regardless of the target dataset compared to self-supervised models. We also find that (2) models finetuned on larger scale datasets are more likely to introduce new biased associations. Our results also suggest that (3) biases can transfer to finetuned models and the finetuning objective and dataset can impact the extent of transferred biases. |
Jaspreet Ranjit · Tianlu Wang · Baishakhi Ray · Vicente Ordonez 🔗 |
-
|
On Comparing Fair classifiers under Data Bias
(
Poster
)
>
In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum \& Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases; 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments. |
mohit sharma · Amit Deshpande · Rajiv Ratn Shah 🔗 |
-
|
Reevaluating COMPAS: Base Rate Tracking and Racial Bias
(
Poster
)
>
COMPAS is a controversial Recidivism Assessment Instrument (RAI) that has been used in the US criminal justice system to predict recidivism in pretrial settings. Angwin et al. (2016) argued that COMPAS is biased against Blacks because it violates a fairness criterion known as equalized odds. However, COMPAS satisfies another two prominent fairness criteria known as weak calibration and predictive parity, which are known to be inconsistent with equalized odds in most realistic settings. Eva (2022) argues that weak calibration is not sufficient for algorithmic fairness and claims that a different criterion, base rate tracking, is at least a necessary condition.In this paper, we present four different natural ways of measuring how badly COMPAS violates base rate tracking, i.e. how much the average predicted risk scores across ethnic groups deviate from their actual recidivism prevalence. We find significant deviations in all cases and argue that advocates of base rate tracking do indeed have good reason to be concerned about racial bias in COMPAS. Our interdisciplinary work concludes by raising some further normative questions that remain unanswered by our analysis. |
Victor Crespo · Javier Rando · Benjamin Eva · Vijay Keswani · Walter Sinnott-Armstrong 🔗 |
-
|
Performativity and Prospective Fairness.
(
Poster
)
>
Deploying an algorithmically informed policy is a significant intervention in the structure of society. As is increasingly acknowledged, predictive algorithms have performative effects: using them can shift the distribution of social outcomes away from the one on which the algorithms were trained. Algorithmic fairness research is usually motivated by the worry that these performative effects will exacerbate the structural inequalities that gave rise to the training data. However, standard retrospective fairness methodologies are ill-suited to predict these effects. They impose static fairness constraints that hold after the predictive algorithm is trained, but before it is deployed and, therefore, before performative effects have had a chance to kick in. However, satisfying static fairness criteria after training is not sufficient to avoid exacerbating inequality after deployment. Addressing the fundamental worry that motivates algorithmic fairness requires explicitly comparing the change in relevant structural inequalities before and after deployment. We propose a prospective methodology for estimating this post-deployment change from pre-deployment data and knowledge about the algorithmic policy. That requires a strategy for distinguishing between, and accounting for, different kinds of performative effects. In this paper, we focus on the algorithmic effect on the causally downstream outcome variable. Throughout, we are guided by an application from public administration: the use of algorithms to (1) predict who among the recently unemployed will stay unemployed for the long term and (2) targeting them with labor market programs. We illustrate our proposal by showing how to predict whether such policies will exacerbate gender inequalities in the labor market. |
Sebastian Zezulka · Konstantin Genin 🔗 |
-
|
Explaining knock-on effects of bias mitigation
(
Poster
)
>
In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups. Bias mitigation methods work in different ways and have known "waterfall" effects, e.g., mitigating bias at one place may manifest bias elsewhere. In this paper, we aim to characterise impacted cohorts when mitigation interventions are applied. To do so, we treat intervention effects as a classification task and learn an explainable meta-classifier to identify cohorts that have altered outcomes. We examine a range of bias mitigation strategies that work at various stages of the model life cycle. We empirically demonstrate that our meta-classifier performs well in uncovering impacted cohorts. Further, we show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e. people who receive unfavourable outcomes solely on account of mitigation efforts. This is despite improvement in fairness metrics. We use these results as a basis to argue for more careful audits of static mitigation interventions that go beyond aggregate metrics. |
Svetoslav Nizhnichenkov · Rahul Nair · Elizabeth Daly · Brian Mac Namee 🔗 |
-
|
Addressing The Cost Of Fairness In A Data Market Over Time
(
Poster
)
>
It is well understood that the data generation process is a critical factor that shapes the fairness of a machine-learning system.Since data generation is often mediated by a data market, we ask whether machine-learning fairness can be addressed in data markets as they evolve and, if so, at what cost.We revisit a well-known model of a data market in which data are allocated by a centralized marketplace.If the marketplace decides to enforce fairness, the main question is whether the natural extraction of value from data under a fairness intervention is further constrained and who is affected by it.In a natural class of allocation functions and under mild conditions, we show that no agent in the data market asymptotically loses utility as the market expands to include more buyers---even if the cost of data production is inherently biased against individuals of a particular group.Our initial results suggest that, under certain conditions, the evolution of a system may be a useful tool to address the cost of fairness. |
Augustin Chaintreau · Roland Maio · Juba Ziani 🔗 |
-
|
On Mitigating Unconscious Bias through Bandits with Evolving Biased Feedback
(
Poster
)
>
Media stereotypes, cultural stereotypes, and affinity bias are some of the driving factors shaping our unconscious biases. As the demographic landscape of the workforce evolves, this bias is subject to change, and in particular could be erased or inverted (e.g. computer programming was considered a ``woman's job'' in the US in the 1940s). To study this feedback loop between workforce demographics and bias, we introduce a multi-armed bandit model for which we only perceive a time-dependent biased reward, which is a function of the (evolving) fraction of times we picked each arm. We show that if we ignore the bias, UCB incurs linear regret in this setting. By contrast, when the bias model is exactly known, then an elimination-style algorithm achieves a regret at most $K^2$ times larger than in the standard, unbiased bandit setting. Moreover, we show that this regret scaling is (essentially) unimprovable by deriving a new instance-dependent regret lower bound which is roughly $K^2$ times larger than in the standard bandit setting, even in the setting where the policy knows the bias model exactly. To obtain this lower bound when the observed reward distributions are (i) time-varying and (ii) dependent on the policy's past actions, we develop new proof techniques beyond the standard bandit lower bound arguments, which may be of independent interest. In particular, we identify a ``bottleneck'' set of actions for which any policy must either (a) play many times, or (b) observe significantly biased samples. Then, using a stopped version of the divergence decomposition, we carefully construct a stopping time which allows us to translate cases (a) and (b) into an amplified lower bound.
|
Matthew Faw · Constantine Caramanis · Sanjay Shakkottai · Jessica Hoffmann 🔗 |
-
|
Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness
(
Poster
)
>
Over the last few years the importance of algorithmic fairness in machine learning has gathered more and more traction and developed into a flourishing field of study. However, there still exists a gap between theoretic research on algorithmic fairness and its implementation in practice. Here, we show the importance of addressing this gap by demonstrating how algorithmic fairness heavily depends on the decisions made during a systems’ design and implementation, as biases in data can be mitigated or reinforced along the typical modeling pipeline. We present a framework that can aide in the design of robust real-world applications and help to inform the future study of algorithmic fairness.Drawing on insights from the field of psychology, we introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their implications on fairness. By combining decisions, we create a grid of all possible “universes” of decision combinations. Using the resulting dataset, we can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health care coverage of vulnerable populations for potential interventions. |
Jan Simson · Florian Pfisterer · Christoph Kern 🔗 |
-
|
Fairer and More Accurate Models Through NAS
(
Poster
)
>
Making models algorithmically fairer in tabular data has been long studied, with techniques typically oriented towards fixes which usually take a neural model with an undesirable outcome and make changes to how the data are ingested, what the model weights are, or how outputs are processed. We employ an emergent and different strategy where we consider updating the model's architecture and training hyperparameters to find an entirely new model with better outcomes from the beginning of the debiasing procedure. In this work, we propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data. We conduct extensive exploration of architectural and hyperparameter spaces (MLP, ResNet, and FT-Transformer) across diverse datasets, demonstrating the dependence of accuracy and fairness metrics of model predictions on hyperparameter combinations. We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns. We propose a novel approach that jointly optimizes architectural and training hyperparameters in a multi-objective constraint of both accuracy and fairness. We produce architectures that consistently Pareto dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both, all of this while being Pareto-optimal over hyperparameters achieved through single-objective (accuracy) optimization runs. This research underscores the promise of automating fairness and accuracy optimization in deep learning models. |
Richeek Das · Samuel Dooley 🔗 |
-
|
Causal Dependence Plots
(
Poster
)
>
Explaining artificial intelligence or machine learning models is increasingly important. To use such data-driven systems wisely we must understand how they interact with the world, including how they depend causally on data inputs. In this work we develop Causal Dependence Plots (CDPs) to visualize how one variable---a predicted outcome---depends on changes in another variable---a predictor---along with consequent causal changes in other predictor variables. Crucially, this may differ from standard methods based on holding other predictors constant or assuming they are independent, such as regression coefficients or Partial Dependence Plots (PDPs). CDPs use an auxiliary causal model to produce explanations because causal conclusions require causal assumptions. Our explanatory framework generalizes PDPs, including them as a special case, and enables a variety of other custom interpretive plots to show, for example, the total, direct, and indirect effects of causal mediation. We demonstrate with simulations and real data experiments how CDPs can be combined in a modular way with methods for causal learning or sensitivity analysis. Since people often think causally about input-output dependence, CDPs can be powerful tools in the xAI or interpretable machine learning toolkit and contribute to applications like scientific machine learning and algorithmic fairness. |
Joshua Loftus · Lucius Bynum · Sakina Hansen 🔗 |
-
|
Fairness in link analysis ranking algorithms
(
Poster
)
>
In this paper, we study the problem of fairness in link analysis algorithms in evolving networks. In particular, we formally show that minority groups can get under-represented in ranking algorithms such as HITS and Pagerank, in networks that evolve over time. We show that under-representation does not come out of nowhere, but biased networks can create even more biased rankings: we use an evolving network model with multiple communities to show that homophily plays a central role in amplifying bias against minority groups in rankings based on HITS. We derive a theoretical approximation to show that bias increases in more homophilic networks, showing that the authority scores resulting from applying the HITS algorithm effectively push minorities even further down in the ranking as compared to the degree ranking. The use of evolving networks is particularly important in two ways: (1) to show that such algorithms do not get deployed on static content, but on ever-evolving nodes and links that have a temporal aspect; (2) the scores that link analysis algorithms output are often used as features in learning-to-rank algorithms, implying that biased features will have a lasting effect on the fairness of many ranking schemes. We illustrate our theoretical analysis on both synthetic and real datasets. |
Ana-Andreea Stoica · Augustin Chaintreau · Nelly Litvak 🔗 |
-
|
A Causal Perspective on Label Bias
(
Poster
)
>
A common setting for algorithmic decision making relies on the use of a prediction of a proxy label to decide on a specific course of action or to make a downstream decision, such as the enrollment of a patient in a care management program based on prediction of their expected healthcare expenditure. Proxy labels are used because the true label of interest may be difficult or impossible to measure in practice. However, the use of a proxy label may propagate equity-related harms when the relationship between the unmeasured true label and the proxy label differs across subgroups (e.g. by |
Vishwali Mhasawade · Alexander D'Amour · Stephen Pfohl 🔗 |
-
|
Remembering to Be Fair: On Non-Markovian Fairness in Sequential Decision Making
(
Poster
)
>
Fair decision making has largely been studied with respect to a single decision. In this paper we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of a decision. In this setting, we observe that fairness often depends on the history of the sequential decision making process and not just on the current state. To advance our understanding of this class of fairness problems, we define the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term fairness and anytime fairness. We further explore the interplay between non-Markovian fairness and memory, and how this can support construction of fair policies in sequential decision-making settings. |
Parand A. Alamdari · Toryn Klassen · Elliot Creager · Sheila McIlraith 🔗 |
-
|
FAIR-Ensemble: Homogeneous Deep Ensembling Naturally Attenuates Disparate Group Performances
(
Poster
)
>
Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble --all the individual DNNs share the same training set, architecture, and design choices-- the minority group performance disproportionately improves with the number of models compared to the majority group, i.e. fairness naturally emerges from ensembling. Even more surprising, we find that this gain keeps occurring even when a large number of models is considered, e.g. 20, despite the fact that the average performance of the ensemble plateaus with fewer models. Our work establishes that simple DNN ensembles can be a powerful tool for alleviating disparate impact from DNN classifiers, thus curbing algorithmic harm. We also explore why this is the case. We find that even in homogeneous ensembles, varying the sources of stochasticity through parameter initialization, mini-batch sampling, and data-augmentation realizations, results in different fairness outcomes. |
Wei-Yin Ko · Daniel Dsouza · Karina Nguyen · Randall Balestriero · Sara Hooker 🔗 |
-
|
Fair Clustering: Critique and Future Directions
(
Poster
)
>
Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature in fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering. |
John Dickerson · Seyed Esmaeili · Jamie Morgenstern · Claire Jie Zhang 🔗 |
-
|
Seller-side Outcome Fairness in Online Marketplaces
(
Poster
)
>
This paper aims to investigate and address the issue of seller-side fairness within online marketplaces, where many sellers and their items are not sufficiently exposed to customers in an e-commerce platform. This phenomenon raises concerns regarding the potential loss of revenue associated with less exposed items as well as less marketplace diversity. We introduce the notion of seller-side outcome fairness and build an optimization model to balance collected recommendation rewards and the fairness measure. We then propose a gradient-based data-driven algorithm based on the duality and bandit theory. Our numerical experiments on real e-commerce data sets show that our algorithm can lift seller fairness measures while not hurting metrics like collected Gross Merchandise Value (GMV) and CTR. |
Zikun Ye · Reza Yousefi Maragheh · Lalitesh Morishetti · Shanu Vashishtha · Jason Cho · Kaushiki Nag · Sushant Kumar · Kannan Achan 🔗 |
-
|
Mitigating stereotypical biases in text to image generative systems
(
Poster
)
>
State-of-the-art generative text-to-image models are known to exhibit social biases and over-represent certain groups like people of perceived lighter skin tones and men in their outcomes. In this work, we propose a method to mitigate such biases and ensure that the outcomes are fair across different groups of people. We do this by fine tuning text-to-image models on synthetic data that varies in perceived skin tones and genders constructed from diverse text prompts. These text prompts are constructed from multiplicative combinations of ethnicities, genders, professions, age groups, and so on, resulting in diverse synthetic data. Our diversity fine tuned (DFT) model improves the group fairness metric by $150%$ for perceived skin tone and $97.7%$ for perceived gender. Compared to baselines, DFT models generate more people with perceived darker skin tone and more women. To foster open research, we will release all text prompts and code to generate training images.
|
Piero Esposito · Parmida Atighehchian · Anastasis Germanidis · Deepti Ghadiyaram 🔗 |
-
|
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework
(
Poster
)
>
While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation).This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Rényi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
|
Sina Baharlouei · Meisam Razaviyayn 🔗 |
-
|
On The Vulnerability of Fairness Constrained Learning to Malicious Noise
(
Poster
)
>
We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. [27] initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a Θ(α) loss in accuracy, where α is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an O(√α) loss, and give a matching Ω(√α) lower bound. In contrast, [27] showed for proper learners the loss in accuracy for both notions is Ω(1). The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes O(α),O(√α), and O(1). These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data. |
Avrim Blum · Princewill Okoroafor · Aadirupa Saha · Kevin Stangl 🔗 |
-
|
Model Fairness is Constrained by Decision Making Strategy Design
(
Poster
)
>
Recent years have seen a marked rise in the use of machine learning and AI in hiring. Concurrently, debates on the ethics of AI-enabled tools ensued: can these decision aids alleviate the biases prevalent in human choices, or do they further exaggerate unfairness? On the one hand, statistical tools can make evaluations more standardized. On the other, these models can provide biased estimates when trained on data that are unbalanced with respect to personal characteristics. Here, I present evidence that even prior to the data coming into play, a model's utility is first constrained by how the decision problem is formulated. Consider two hypothetical approaches to deciding how to target job advertisements: Company A builds a model to identify the candidates most similar to their previous applicants with the goal of maximizing the click through rate; Company B has been collecting performance measures for their previous hires and aims to target job ads to those candidates that are predicted to be high achievers. How do these approaches compare on long-term fairness? I conducted a simulation study to mimic a hiring pipeline and evaluate the two approaches on the tendency to propagate the group bias and on the resulting performance of selected candidates. The results suggest that targeting candidates based on their similarity to previous applicants, as in Company A's case, leads to an increase in the categorical group bias concurrent with a decrease in performance. I further found that this approach specifically disadvantages high performers in the underrepresented group. In contrast, Company B's decision to predict performance fares better: it does not affect the bias or performance. This demonstrates that how a company designs their decision-making strategy affects fair candidate evaluation and the resulting performance. Hiring is effortful, fair hiring is even harder, but the fruits of the labor are also greater: in fairness, in performance, and in the bottom line. |
Alexandra Stolyarova 🔗 |
-
|
Algorithmic Fairness Reproducibility: A Close Look at Data Usage over the Years
(
Poster
)
>
Algorithmic fairness has become a significant area of research in recent years, with a growing body of work aimed at addressing bias and discrimination in machine learning systems. Like many other fields, however, this expanding field is faced with the challenge of ensuring the reliability and reproducibility of research findings. This is particularly the case with respect to the datasets used, as it has been shown that a large body of research in the field is built upon a small number of datasets. Moreover, many of these popular datasets have been found to have significant flaws and recent large-scale studies have shown alarming signs of low reproducibility.In this work-in-progress, we examine the landscape of dataset usage in algorithmic fairness research, shedding light on the practices employed and the challenges encountered. We present an ongoing investigation into the use of tabular datasets within algorithmic fairness research.Our preliminary results point to a two-fold reproducility issue. On the one hand, data are not uniformly preprocessed. The same basic datasets are often used quite differently, with researchers performing different processing on columns (especially columns used as protected attributes), using different subsets of the data or different columns for prediction tasks. On the other hand, many papers do not provide enough information to reliably identify not only which preprocessing steps, but also which specific resource has been used, as popular datasets such as the widely used COMPAS dataset come with different subsets of data.We encourage the adoption of practices from the open science community, such as sharing of code to improve transparency, reproducibility and comparability of research in algorithmic fairness. To ease this process we plan to release a Python package for standardized loading and preprocessing of datasets for use in the fairness literature. |
Jan Simson · Alessandro Fabris · Christoph Kern 🔗 |
-
|
Bayesian Multilevel Regression and Poststratification for Dynamic Diversity-Aware Modeling
(
Poster
)
>
Fairness in algorithmic decision-making often relies on predefined notions within a stable data landscape. However, the world we strive to model is a symphony of unprecedented change and nuanced interactions. This dynamic nature, intrinsic to evolving societies, is frequently overlooked in traditional fairness studies. We introduce a quantitative framework for diversity-aware population modeling –leveraging a Bayesian Multilevel Regression and Poststratification (MRP) strategy to mitigate unrepresentative data distributions and observed biases. Our approach integrates common and individual sources of variance in a hierarchical network, offering a unified and flexible platform to directly capture and quantify major sources of population stratification, at multiple stages of the modeling process. Our framework primarily centers on post-processing fairness techniques, reconciling existing statistical methods—specifically poststratification—with expressive generative models. We utilize the Adolescent Brain Cognitive Development (ABCD) cohort, a collaborative aggregation of longitudinal data from 11,000+ children aged 9-10, across 17 US states, for our proof of principle study. We model the effect of socioeconomic status on cognitive development, accounting for geographical and racial disparities. Poststratification provides ample stage to analyze the data distributions through a temporal lens, as confounding factors morph the real-world representations of both regional and ethnic predictors. The integration of census data, subject to annual updates, into our hierarchical model, serves a pivotal role in capturing this complexity, enabling us to finely adjust our posterior estimates by carefully recalibrating them based on precise state- and race-level proportions. We demonstrate that Bayesian MRP can be tailored to develop diversity-aware population models, providing crucial insights into dynamic fairness for generative modeling. |
Nicole Osayande · Danilo Bzdok 🔗 |
-
|
The Long-Term Effects of Personalization: Evidence from Youtube
(
Poster
)
>
We propose an experimental design based on a browser extension. In our ongoing study, we explore the long-term effects of personalization on content consumption on youtube.com. Participants, upon installing the extension, are randomly assigned to Treatment or Control groups. The Control group sees default personalized recommendations, with the extension collecting and sending their usage data to our server. For the Treatment group, some recommendations are replaced with generic, non-personalized ones, akin to the experience of an unlogged user. With 500 U.S. participants using the extension for three months, our study aims to address questions like: ”Does personalization increase platform engagement?”, ”Does it reduce content diversity?”, and ”Do generic recommendations introduce a popularity bias?”. Our methodology allows us to gauge outcomes in relation to the strength of the de-personalization intervention, as we document both personalizedand un-personalized recommendations for every user interaction. |
Andreas Haupt · Mihaela Curmei · François-Marie de Jouvencel · Marc Faddoul · Benjamin Recht · Dylan Hadfield-Menell 🔗 |
-
|
Allocating Bonus Points in Sequential Matchings with Preference Dynamics
(
Poster
)
>
Allocating bonus points (BP) in college admissions is a popular form of affirmative action aimed at increasing the representation of protected groups in different college programs. We propose to explore the effect of BP policies on the preferences and admission of future generations of applicants.Inspired by the Norwegian college admission system, we propose to model the repeated centralized college admission procedure as a Markov decision process (MDP) where students have evolving preferences and are admitted to different programs via a stable matching algorithm. Here, we assume that the current representation rate of a group in the different study programs affects the group's preferences in the next time step.Given this framework, we present two research objectives: 1) exploring trade-offs between student preference satisfaction, success and representation, and 2) analysing the existence, desirability and reachability of stable states. |
Meirav Segal · Liu Leqi · Anne-Marie George · Christos Dimitrakakis · Hoda Heidari 🔗 |
-
|
Equal Opportunity under Performative Effects
(
Poster
)
>
There is a growing interest in automating decision-making by using machine learning (ML) models to estimate scores that can be used to rank candidates. Consider, as an example, loan application decisions. An institution might train an ML model from historical data to predict the probability that a candidate will default on their loan, based on various features in their application. For new applicants, the trained model predicts a score that the institution can use to approve or reject loan applications, or at least rank the applicants for further review.In these ML-based decision-making contexts, the field of algorithmic fairness has developed a number of metrics to assess the disparity faced by different demographic groups, as well as by individuals. However, algorithmic decision-making is often dynamic, with individuals responding to the deployment of ML models and their automated predictions. In the loan example, rejected applicants may adapt their applications to get a better outcome, potentially by gaming the classifier or making changes to become more loan-worthy candidates in the future. These dynamics lead to a feedback loop between the model’s prediction and the true outcome (e.g., defaulting on a loan or not), which is hard to analyze and is often ignored by existing fairness metrics, which assume a static data-generating process. Recently the fields of strategic classification and performative prediction have begun to address this phenomenon in the context of machine learning, but most works ignore potential disparities between different segments of the population. In this emerging line of work, models often do not account for factors that may result in demographic groups adapting differently, which can lead to unfair decisions when applied to the different segments of the population. |
Sophia Gunluk · Dhanya Sridhar · Antonio Gois · Simon Lacoste-Julien 🔗 |
-
|
Assessing Perceived Fairness in Machine Learning (ML) Process: A Conceptual Framework
(
Poster
)
>
In ML applications, “unfairness” can be caused by bias in the data, curation process, erroneous assumptions, and implicit bias rendered within the algorithmic development process. As ML applications come into broader use, developing fair ML applications is critical. Assessing fairness and developing fair ML applications has become important in the era of Responsible AI in practice in research, industry, and academia. However, a literature survey suggests that fairness in ML is very subjective, and there is no coherent way to describe the fairness of AI/ML processes and applications. To better understand the perception of fairness in the ML process, we conducted virtual focus groups with developers, reviewed prior literature, and integrated notions of justice theory to propose that perceived fairness is a multidimensional concept. In this paper, we will explore the initial outcomes of this effort. |
Anoop Mishra · Deepak Khazanchi 🔗 |
-
|
Unbiased Sequential Prediction for Fairness in Predictions-to-Decisions Pipelines
(
Poster
)
>
We develop an efficient sequential (online adversarial) algorithm for making high-dimensional vector predictions --- to be fed into a downstream decision maker --- such that these predictions are fair in various senses with respect to the predictions-to-decisions pipeline, and in particular fair conditional on downstream actions taken by the decision maker as a function of these predictions. (1) As a major example, in the online problem where at each round, a subset of k-many candidates is selected by a downstream selection method as a function of our predictions about each of these candidates' success rates, our algorithm can give predictions that are fair to each candidate, in the sense that the candidate's average predicted probability is unbiased relative to the true empirical average, both \emph{over those days when the candidate was selected} and also \emph{over days when the candidate wasn't selected}. Thus, e.g., each candidate can be assured that their probability of success is not systematically underestimated whenever they aren't picked, and conversely, that no other candidate's chances of success are systematically overestimated when they are picked.(2) Via another instantiation of our prediction algorithm, we improve on the online multigroup fairness result of Blum and Lykouris (2020), who produce predictions with no \emph{external} regret conditional on all demographic groups of interest. We directly improve on this by providing the much stronger no \emph{swap} regret guarantees for each group.(3) Our efficient algorithm, in its general form, can make unbiased predictions conditional on any polynomially many subsequences of rounds, which can be functions of our predictions themselves. This is classically achievable via (full) online calibration (Foster and Vohra, 1998), which is however inefficient and statistically suboptimal in high dimensions. Our algorithm is thus a novel computationally and statistically efficient relaxation of calibration. |
Georgy Noarov · Ramya Ramalingam · Aaron Roth · Stephan Xie 🔗 |
-
|
Deep Reinforcement Learning for Efficient and Fair Allocation of Healthcare Resources
(
Poster
)
>
Scarcity of health care resources could result in the unavoidable consequence of rationing. For example, ventilators are often limited in supply, especially during public health emergencies or in resource-constrained health care settings, such as amid the pandemic of COVID-19. Currently, there is no universally accepted standard for health care resource allocation protocols, resulting in different governments prioritizing patients based on various criteria and heuristic-based protocols. In this study, we investigate the use of reinforcement learning for critical care resource allocation policy optimization to fairly and effectively ration resources. We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients during the critical care resource allocation. We aim to improve both fairness of allocation and overall patient outcomes. Our experiments demonstrate that our method significantly reduces excess deaths and achieves a more equitable distribution under different levels of ventilator shortage, when compared to existing severity-based and comorbidity-based methods in use by different governments. |
Yikuan Li 🔗 |
-
|
What Comes After Auditing: Distinguishing Between Algorithmic Errors and Task Specification Issues
(
Poster
)
>
General-purpose generative AI models (GMs) have demonstrated remarkable capabilities, but they have also exhibited instances of inappropriate or harmful behavior, often stemming from the inherent subjectivity of the tasks they undertake. While auditing and benchmarking work provide a vital starting point in understanding the harms perpetuated by GMs, the proposed solutions for updating GMs often reveal a disconnect with the nuance of task subjectivity. Consequently, we argue for the importance of distinguishing between task specification issues and algorithmic error issues both conceptually and methodologically in handling them, to comprehensively mitigate algorithm harm. |
Charvi Rastogi 🔗 |
-
|
Designing Long-term Group Fair Policies in Dynamical Systems
(
Poster
)
>
Neglecting the effect that decisions have on individuals (and thus, on the underlying data distribution) when designing algorithmic decision-making policies may increase inequalities and unfairness in the long term—even if fairness considerations were taken in the policy design process. In this paper, we propose a novel framework for achieving long-term group fairness in dynamical systems, in which current decisions may affect an individual’s features in the next step, and thus, future decisions. Specifically, our framework allows us to identify a time-independent policy that converges, if deployed, to the targeted fair stationary state of the system in the long term, independently of the initial data distribution. We model the system dynamics with a time-homogeneous Markov chain and optimize the policy leveraging the Markov chain convergence theorem to ensure unique convergence. We provide examples of different targeted fair states of the system, encompassing a range of long-term goals for society and policy makers. Furthermore, we show how our approach facilitates the evaluation of different long-term targets by examining their impact on the group-conditional population distribution in the long term and how it evolves until convergence. |
Miriam Rateike · Isabel Valera · Patrick Forré 🔗 |
-
|
Backtracking Counterfactual Fairness
(
Poster
)
>
In this work, we introduce backtracking counterfactual fairness, a novel definition of counterfactual fairness that uses backtracking rather than interventional counterfactuals. This definition captures the following intuition: would changing your predicted outcome place an undue burden on you? Our definition is compatible with different normative choices about what constitutes an undue burden. Backtracking counterfactuals, unlike interventional counterfactuals, consider counterfactual worlds in which the causal mechanisms remain unchanged. This allows backtracking counterfactual fairness to avoid one of the key sociological and normative tensions running through other counterfactual-based fairness notions: modularity. We demonstrate how our proposal relates to other notions of fairness and fair recourse on both real and simulated data, suggesting a novel way to make use of causal information for more equitable decision making and a possible path to considering counterfactual-based fairness notions even in the presence of non-modular variables. |
Lucius Bynum · Joshua Loftus · Julia Stoyanovich 🔗 |
-
|
Learning in reverse causal strategic environments with ramifications on two sided markets
(
Poster
)
>
Motivated by equilibrium models of labor markets, we develop a formulation of causal strategic classification in which strategic agents can directly manipulate their outcomes. As an application, we consider employers that seeks to anticipate the strategic response of labor force when developing a hiring policy. We show theoretically that such performative (optimal) hiring policies improves employer and labor force welfare (compared to employers that do not anticipate the strategic labor force response) in the classic Coate-Loury labor market model. Empirically, we show that these desirable properties of performative hiring policies do generalize to our own formulation of a general equilibrium labor market. On the other hand, we observe that in our formulation a performative firm both harms workers by reducing their aggregate utility and fails to prevent discrimination when more sophisticated wage and cost structures are introduced. |
Seamus Somerstep · Yuekai Sun · Ya'acov Ritov 🔗 |
-
|
Repairing Regressors for Fair Binary Classification at Any Decision Threshold
(
Poster
)
>
We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. By decreasing the statistical distance between each group's score distributions, we show that we can increase fair performance across all thresholds at once, and that we can do so without a large decrease in accuracy. To this end, we introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity, thereby attaining common notions of group fairness like Equalized Odds or Equal Opportunity at all thresholds. We demonstrate on two fairness benchmarks that our technique works well empirically , while also outperforming and generalizing similar techniques from related work. |
Kweku Kwegyir-Aggrey · Jessica Dai · A. Feder Cooper · John Dickerson · Suresh Venkatasubramanian · Keegan Hines 🔗 |
-
|
It’s About Time: Fairness and Temporal Depth
(
Poster
)
>
This paper considers temporal depth as a conceptual framework for simplifying and reasoning about algorithmic fairness. In typical fairness applications greater temporal depth generally corresponds to stronger fairness requirements. We describe how to apply our temporal heuristics in both observational and causal probability models and their corresponding fairness definitions. As an example conclusion, one of our heuristics implies that equality of opportunity essentially justifies all disparities. In the framework of counterfactual fairness, we use temporal depth of counterfactuals to reason about common ideals like opportunity and merit, critique other causal criteria involving direct and indirect effects, and comment on long-standing debates about causation without manipulation and the use of socially constructed traits as causes. There are diverse and potentially conflicting criteria for algorithmic fairness. Heuristics like temporal depth can help us reason about fairness in a unified way, compare differing criteria, and make good decisions. |
Joshua Loftus 🔗 |
-
|
Are computational interventions to advance fair lending robust to different modeling choices about the nature of lending?
(
Poster
)
>
To what degree are common interventions to improve the fairness of lending decisions based on machine learning models robust to modeling choices about the nature of lending? In this paper, we focus on the following modeling choices: 1) whether consumer and lender welfare is naturally aligned, 2) whether consumer interests are uniform, 3) whether loan decisions are binary (lend/don't lend) or continuous (varied loan terms), and 4) whether the cost of interventions are shouldered by lenders or passed along to consumers. For a variety of common interventions, we find that varying these modeling choices can lead to very different conclusions about how interventions impact consumer welfare and whether interventions actually help the consumers they intend to help. We discuss three such interventions: the use of alternative data, quantitative fairness constraints, and counterfactual explanations. We show that interventions that would seem likely to advance consumer welfare under certain modeling choices could end up undermining consumer welfare under reasonable alternative choices. |
Benjamin Laufer · Manish Raghavan · Solon Barocas 🔗 |
-
|
Improving Fairness in Facial Recognition Models with Distribution Shifts
(
Poster
)
>
In this paper, we aim to improve the robustness of the machine learning algorithms in facial recognition when the underlying datasets have distribution shifts. This is particularly important when we design fair algorithms for different demographics and changing environments. A classification algorithm trained on a certain face dataset is sensitive when it shifts to a different face dataset. Even if we have access to enough data, as time goes by, the distribution of the target data may shift due to the aging of the population, the environment, the proportions of demographics, and so on. We first address this issue by providing empirical studies of the out-of-distribution effect on some popular face datasets. Through the exposure of auxiliary datasets and outliers, we provide ways to improve the model performance when training and testing data come from different distributions. Furthermore, class imbalance and distribution shift issues can happen simultaneously. We emphasize the need to consider both and showcase the performance of the model on different face dataset combinations. |
Gianluca Barone · Aashrit Cunchala · Rudy Nunez · Nicole Yang 🔗 |
-
|
Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets
(
Poster
)
>
link
Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges. |
Song Wei · Xiangrui Kong · Sarah Huestis-Mitchell · Yao Xie · Shixiang Zhu · Alinson Xavier · Feng Qiu 🔗 |
-
|
Democratise with Care: The need for fairness specific features in user-interface based open source AutoML tools
(
Poster
)
>
AI is increasingly playing a pivotal role in businesses and organizations, impacting the outcomes and interests of human users. Automated Machine Learning (AutoML) streamlines the machine learning model development process by automating repetitive tasks and making data-driven decisions, enabling even non-experts to construct high-quality models efficiently. This democratization allows more users (including non-experts) to access and utilize state-of-the-art machine-learning expertise. However, AutoML tools may also propagate bias in the way these tools handle the data, model choices, and optimization approaches adopted. We conducted an experimental study of User-interface-based open source AutoML tools (DataRobot, H2O Studio, Dataiku, and Rapidminer Studio) to examine if they had features to assist users in developing fairness-aware machine learning models. The experiments covered the following considerations for the evaluation of features: understanding use case context, data representation, feature relevance and sensitivity, data bias and preprocessing techniques, data handling capabilities, training-testing split, hyperparameter handling, and constraints, fairness-oriented model development, explainability and ability to download and edit models by the user. The results revealed inadequacies in features that could support in fairness-aware model development. Further, the results also highlight the need to establish certain essential features for promoting fairness in AutoML tools. |
Sundaraparipurnan Narayanan 🔗 |