Timezone: »

Workshop
Algorithmic Fairness through the Lens of Causality and Privacy
Awa Dieng · Miriam Rateike · Golnoosh Farnadi · Ferdinando Fioretto · Matt Kusner · Jessica Schrouff

Sat Dec 03 05:30 AM -- 02:55 PM (PST) @ Room 392

As machine learning models permeate every aspect of decision making systems in consequential areas such as healthcare and criminal justice, it has become critical for these models to satisfy trustworthiness desiderata such as fairness, interpretability, accountability, privacy and security. Initially studied in isolation, recent work has emerged at the intersection of these different fields of research, leading to interesting questions on how fairness can be achieved using a causal perspective and under privacy concerns.

Indeed, the field of causal fairness has seen a large expansion in recent years notably as a way to counteract the limitations of initial statistical definitions of fairness. While a causal framing provides flexibility in modelling and mitigating sources of bias using a causal model, proposed approaches rely heavily on assumptions about the data generating process, i.e., the faithfulness and ignorability assumptions. This leads to open discussions on (1) how to fully characterize causal definitions of fairness, (2) how, if possible, to improve the applicability of such definitions, and (3) what constitutes a suitable causal framing of bias from a sociotechnical perspective?

Additionally, while most existing work on causal fairness assumes observed sensitive attribute data, such information is likely to be unavailable due to, for example, data privacy laws or ethical considerations. This observation has motivated initial work on training and evaluating fair algorithms without access to sensitive information and studying the compatibility and trade-offs between fairness and privacy. However, such work has been limited, for the most part, to statistical definitions of fairness raising the question of whether these methods can be extended to causal definitions.

Given the interesting questions that emerge at the intersection of these different fields, this workshop aims to deeply investigate how these different topics relate, but also how they can augment each other to provide better or more suited definitions and mitigation strategies for algorithmic fairness.

 Sat 5:30 a.m. - 5:40 a.m. Opening remarks - online (Opening remarks by organizers) 🔗 Sat 5:40 a.m. - 6:10 a.m. Invited Talk (Talk) Razieh Nabi 🔗 Sat 6:10 a.m. - 6:20 a.m. Invited talk Q&A (Q&A) 🔗 Sat 6:20 a.m. - 6:50 a.m. Invited Talk (Talk) Deirdre Mulligan 🔗 Sat 6:50 a.m. - 7:00 a.m. Invited talk Q&A (Q&A) 🔗 Sat 7:00 a.m. - 7:30 a.m. Online Roundtables - please use the zoom link corresponding to the table you would like to join (Roundtables) 🔗 Sat 7:00 a.m. - 7:30 a.m. Causality Roundtable (online roundtable)  link » Please join the following zoom link for the Causality table: https://hecmontreal.zoom.us/j/84688477383?pwd=bGE1TitVM1QwN1JKMVU5RFEvTVk1Zz09 Link » Dhanya Sridhar · Amanda Coston 🔗 Sat 7:00 a.m. - 7:30 a.m. Privacy Roundtable (online roundtable)  link » Please join the following zoom link for the privacy table: https://hecmontreal.zoom.us/j/85421948233?pwd=YVRiZk5FTllZaURZcWxTZ3JUdlpsUT09 Link » Ulrich Aïvodji 🔗 Sat 7:00 a.m. - 7:30 a.m. Ethics Roundtable (online roundtable)  link » Please join the following zoom link for the Ethics table: https://hecmontreal.zoom.us/j/86708623178?pwd=OHFqRHU2R3pEOUZJSnQreVgzTG5pdz09 Link » Negar Rostamzadeh · Sina Fazelpour · Nyalleng Moorosi 🔗 Sat 7:30 a.m. - 8:00 a.m. Break (break) 🔗 Sat 8:00 a.m. - 8:10 a.m. Opening remarks (Opening remarks by organizers) Awa Dieng 🔗 Sat 8:10 a.m. - 8:40 a.m. Invited Talk (Talk) Nicolas Papernot 🔗 Sat 8:40 a.m. - 8:55 a.m. Invited talk Q&A (Q&A) 🔗 Sat 8:55 a.m. - 9:00 a.m. coffee break (break) 🔗 Sat 9:00 a.m. - 9:10 a.m. Disparate Impact in Differential Privacy from Gradient Misalignment (Oral)  link »    As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies can worsen unfair tendencies in models. In particular, one of the most widely used techniques for private model training, differentially private stochastic gradient descent (DPSGD), frequently intensifies disparate impact on groups within data. In this work we study the fine-grained causes of unfairness in DPSGD and identify gradient misalignment due to inequitable gradient clipping as the most significant source. This observation leads us to a new method for reducing unfairness by preventing gradient misalignment in DPSGD. Link » Maria Esipova · Atiyeh Ashari · Yaqiao Luo · Jesse Cresswell 🔗 Sat 9:10 a.m. - 9:15 a.m. Contributed talk Q&A (Q&A) 🔗 Sat 9:15 a.m. - 9:25 a.m. Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes (Oral)  link »    It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences. Fair ML has largely focused on the protection of single attributes in the simpler setting where both attributes and target outcomes are binary. However, the practical application in many a real-world problem entails the simultaneous protection of multiple sensitive attributes, which are often not simply binary, but continuous or catagorical. To address this more challenging task, we introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces. This leads to two practical tools: first, the FairCOCCO Score, a normalised metric that can quantify fairness in settings with single or multiple sensitive attributes of arbitrary type; and second, a subsequent regularisation term that can be incorporated into arbitrary learning objectives to obtain fair predictors. These contributions address crucial gaps in the algorithmic fairness literature, and we empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on both synthetic and real-world datasets. Link » Tennison Liu · Alex Chan · Boris van Breugel · Mihaela van der Schaar 🔗 Sat 9:25 a.m. - 9:30 a.m. Contributed talk Q&A (Q&A) 🔗 Sat 9:30 a.m. - 9:40 a.m. Tensions Between the Proxies of Human Values in AI (Oral)  link »    Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e.g. privacy, fairness, and model transparency. Yet, we argue this is fundamentally misguided because these definitions are imperfect, siloed constructions of the human values they hope to proxy, while giving the guise that those values are sufficiently embedded in our technologies. Under popularized methods, tensions arise when practitioners attempt to achieve each pillar of fairness, privacy, and transparency in isolation or simultaneously. In this paper, we push for redirection. We argue that the AI community needs to consider alternative formulations of these pillars based on the context in which technology is situated. By leaning on sociotechnical systems research, we can formulate more compatible, domain-specific definitions of our human values for building more ethical systems. Link » Teresa Datta · Daniel Nissani · Max Cembalest · Akash Khanna · Haley Massa · John Dickerson 🔗 Sat 9:40 a.m. - 9:45 a.m. Contributed talk Q&A (Q&A) 🔗 Sat 9:45 a.m. - 9:55 a.m. Stochastic Differentially Private and Fair Learning (Oral)  link »    A major concern with the use of machine learning (ML) models for high-stakes decision-making (e.g criminal sentencing or commercial lending) is that these models sometimes discriminate against certain demographic groups (e.g. race, gender, age). Fair learning algorithms have been developed to address this issue, but these algorithms can still leak sensitive information (e.g. race, gender, age). Differential privacy (DP) guarantees that sensitive data cannot be leaked. Existing algorithms for DP fair learning are impractical for training large-scale models since they either: a) require computations on the full data set in each iteration of training; or b) are not guaranteed to converge. In this paper, we provide the first efficient differentially private algorithm for fair learning that is guaranteed to converge, even when minibatches of data are used (i.e. stochastic optimization). Our framework is flexible enough to permit different fairness notions (e.g. demographic parity, equalized odds) and non-binary classification with multiple (non-binary) sensitive attributes. Along the way, we provide the first utility guarantee for a DP algorithm for solving nonconvex-strongly concave min-max problems. Extensive numerical experiments show that our algorithm consistently offers significant performance gains vs. state-of-the-art DP fair baselines. Moreover, our algorithm is amenable to large-scale ML with non-binary targets and non-binary sensitive attributes. Link » Andrew Lowy · Devansh Gupta · Meisam Razaviyayn 🔗 Sat 9:55 a.m. - 10:00 a.m. Contributed Talk Q&A (Live Questions) 🔗 Sat 10:00 a.m. - 11:00 a.m. In-person Roundtables - not livestreamed (Roundtables) 🔗 Sat 11:00 a.m. - 12:00 p.m. Lunch break (break) 🔗 Sat 12:00 p.m. - 12:30 p.m. Invited Talk (Talk) Catuscia Palamidessi 🔗 Sat 12:30 p.m. - 12:45 p.m. Invited talk Q&A (Q&A) 🔗 Sat 12:45 p.m. - 12:50 p.m. coffee break (break) 🔗 Sat 12:50 p.m. - 1:00 p.m. Causality for Temporal Unfairness Evaluation and Mitigation (Oral)  link »    Recent interests in causality for fair decision-making systems has been accompanied with great skepticism due to practical and epistemological challenges with applying existing causal fairness approaches. Existing works mainly seek to remove the causal effect of social categories such as race or gender along problematic pathways of an underlying DAG model. However, in practice DAG models are often unknown. Further, a single entity may not be held responsible for the discrimination along an entire causal pathway. Building on the “potential outcomes framework,” this paper aims to lay out the necessary conditions for proper application of causal fairness. To this end, we propose a shift from postulating interventions on immutable social categories to their perceptions and highlight two key aspects of interventions that are largely overlooked in the causal fairness literature: timing and nature of manipulations. We argue that such conceptualization is key in evaluating the validity of causal assumptions and conducting sound causal analysis including avoiding post-treatment bias. Additionally, choosing the timing of the intervention properly allows us to conduct fairness analyses at different points in a decision-making process. Our framework also addresses the limitations of fairness metrics that depend on statistical correlations. Specifically, we introduce causal variants of common statistical fairness notions and make a novel observation that under the causal framework there is no fundamental disagreement between different criteria. Finally, we conduct extensive experiments on synthetic and real-world datasets including a case study on police stop and search decisions and demonstrate the efficacy of our framework in evaluating and mitigating unfairness at various decision points. Link » Aida Rahmattalabi · Alice Xiang 🔗 Sat 1:00 p.m. - 1:05 p.m. Contributed talk Q&A (Q&A) 🔗 Sat 1:05 p.m. - 1:15 p.m. Causal Discovery for Fairness (Oral)  link »    Fairness guarantees that the ML decisions do not result in discrimination against individuals or minorities. Identifying and measuring reliably fairness/discrimination is better achieved using causality which considers the causal relation, beyond mere association, between the sensitive attribute (e.g. gender, race, religion, etc.) and the decision (e.g. job hiring, loan granting, etc.). The big impediment to the use of causality to address fairness, however, is the unavailability of the causal model (typically represented as a causal graph). Existing causal approaches to fairness in the literature do not address this problem and assume that the causal model is available. In this paper, we do not make such assumption and we review the major algorithms to discover causal relations from observable data. In particular, we show how different causal discovery approaches may result in different causal models and, most importantly, how even slight differences between causal models can have significant impact on fairness/discrimination conclusions. Link » Ruta Binkyte-Sadauskiene · Karima Makhlouf · Carlos Pinzon · Sami Zhioua · Catuscia Palamidessi 🔗 Sat 1:15 p.m. - 1:20 p.m. Contributed talk Q&A (Q&A) 🔗 Sat 1:20 p.m. - 1:30 p.m. coffee break 🔗 Sat 1:30 p.m. - 2:15 p.m. Panel Kristian Lum · Rachel Cummings · Jake Goldenfein · Sara Hooker · Joshua Loftus 🔗 Sat 2:15 p.m. - 2:55 p.m. Poster Session (poster session) 🔗 Sat 2:50 p.m. - 2:55 p.m. Closing remarks 🔗 - Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors (Poster) []  The pursuit of long-term fairness involves the interplay between decision-making and the underlying data generating process. In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. We propose \emph{Tier Balancing}, a technically more challenging but more natural notion to achieve in the context of long-term, dynamic fairness analysis. Different from previous fairness notions that are defined purely on observed variables, our notion goes one step further and deeper, capturing behind-the-scenes situation changes on the unobserved latent causal factors that directly carry out the influence from the current decision to the future data distribution. Under the specified dynamics, we prove that in general one cannot achieve the long-term fairness goal only through one-step interventions. Furthermore, in the effort of approaching long-term fairness, we consider the mission of "getting closer to" the long-term fairness goal and present possibility and impossibility results accordingly. Zeyu Tang · Yatong Chen · Yang Liu · Kun Zhang 🔗 - Fairness of Interaction in Ranking under Exposure, Selection, and Trust Bias (Poster) Ranking algorithms in online platforms serve not only users on the demand side, but also items on the supply side. Traditionally, ranking presents items in an order that maximizes their utility to users, by sorting them according to inferred relevance. The fairness of the interaction that different items receive as a result of such a ranking can have ethical, legal, and promotional implications. Interaction, however, is affected by various forms of bias, two of which have received considerable attention: exposure bias and selection bias. Exposure bias, also known as position or presentation bias, occurs due to lower likelihood of observation in lower ranked positions. Selection bias in the data occurs because interaction is not possible with items below an arbitrary cutoff position chosen by the front-end application at deployment time (i.e., showing only the top $k$ items). A less studied third form of bias, trust bias, is equally important, as it makes interaction depend on rank even after observation, by influencing the perceived relevance of items. This paper introduces a flexible fairness metric that captures interaction disparity in the presence of all three biases, and proposes a post-processing framework to trade off fairness and utility which improves fairness toward items while maintaining user utility and outperforms state-of-the-art fair ranking algorithms. Zohreh Ovaisi 🔗 - Equality of Effort via Algorithmic Recourse (Poster) []  This paper proposes a novel way of measuring fairness through equality of effort by applying algorithmic recourse through minimal interventions. Equality of effort is a property that can be quantified at both the individual and the group level. It answers the counterfactual question: what is the minimal cost for a protected individual or the average minimal cost for a protected group of individuals to reverse the outcome computed by an automated system? Algorithmic recourse increases the flexibility and applicability of the notion of equal effort: it overcomes its previous limitations by reconciling multiple treatment variables, introducing feasibility and plausibility constraints, and integrating the actual relative costs of interventions. We extend the existing definition of equality of effort and present an algorithm for its assessment via algorithmic recourse. We validate our approach both on synthetic data and on the German credit dataset. Francesca Raimondi · Andrew Lawrence · Hana Chockler 🔗 - Counterfactual Situation Testing: Fairness given the Difference (Poster) We present counterfactual situation testing (cfST), a new tool for detecting discrimination in datasets that operationalizes the Kohler-Hausmann Critique (KHC) of "fairness given the difference''. In situation testing (ST), like other discrimination analysis tools, the discrimination claim is recreated and thus tested by finding similar individuals to the one making the claim, the complainant $c$, and constructing a control group (what is) and a test group (what would have been if) of protected and non-protected individuals, respectively. ST builds both groups around $c$, which is wrong based on the KHC. Under cfST, we extend ST by constructing the control group around the complainant, which is the factual, and the test group around its counterfactual using the abduction, action, and prediction steps. We thus end up comparing control and test groups of not so similar individuals: one based on what we observe about $c$ versus one based on a hypothetical representation of $c$. By comparing these two different groups, we address the KHC and test for discrimination using a more meaningful causal interpretation of the protected attribute and its effects on all other attributes. We compare cfST to existing ST methods using synthetic data for a loan application process. The results show that cfST detects a higher number of discrimination cases than ST. Jose Alvarez · Salvatore Ruggieri 🔗 - Learning Counterfactually Invariant Predictors (Poster) []     We propose a method to learn predictors that are invariant under counterfactual changes of certain covariates. This method is useful when the prediction target is causally influenced by covariates that should not affect the predictor output. For instance, this could prevent an object recognition model from being influenced by position, orientation, or scale of the object itself. We propose a model-agnostic regularization term based on conditional kernel mean embeddings to enforce \counterfactual invariance during training. We prove the soundness of our method, which can handle mixed categorical and continuous multivariate attributes. Empirical results on synthetic and real-world data demonstrate the efficacy of our method in a variety of settings. Cecilia Casolo · Krikamol Muandet 🔗 - Bias Mitigation Framework for Intersectional Subgroups in Neural Networks (Poster) []     We propose a fairness-aware learning framework that mitigates intersectional subgroup bias associated with protected attributes. Prior research has primarily focused on mitigating one kind of bias by incorporating complex fairness-driven constraints into optimization objectives or designing additional layers that focus on specific protected attributes. We introduce a simple and generic bias mitigation approach that prevents models from learning relationships between protected attributes and output variable by reducing mutual information between them. We demonstrate that our approach is effective in reducing bias with little or no drop in accuracy. We also show that the models trained with our learning framework become causally fair and insensitive to the values of protected attributes. Finally, we validate our approach by studying feature interactions between protected and non-protected attributes. We demonstrate that these interactions are significantly reduced when applying our bias mitigation. Narine Kokhlikyan · Bilal Alsallakh · Fulton Wang · Vivek Miglani · Aobo Yang · David Adkins 🔗 - A Closer Look at the Calibration of Differential Private Learners (Poster) []     We systematically study the calibration of classifiers trained with differentially private stochastic gradient descent (DP-SGD) and observe miscalibration across a wide range of vision and language tasks. Our analysis identifies per-example gradient clipping in DP-SGD as a major cause of miscalibration, and we show that existing baselines for improving private calibration only provide small improvements in calibration error while occasionally causing large degradation in accuracy. As a solution, we show that differentially private variants of post-processing calibration methods such as temperature calibration and Platt scaling are surprisingly effective and have negligible utility cost to the overall model. Across 7 tasks, temperature calibration and Platt scaling with DP-SGD result in an average 55-fold reduction in the expected calibration error and only incurs an up to 1.59 percent drop in accuracy. Hanlin Zhang · Xuechen (Chen) Li · Prithviraj Sen · Salim Roukos · Tatsunori Hashimoto 🔗 - Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity (Poster) []     Existing regulations often prohibit model developers from accessing protected attributes (gender, race, etc.) during training. This leads to scenarios where fairness assessments might need to be done on populations without knowing their memberships in protected groups. In such scenarios, institutions often adopt a separation between the model developers and a compliance team (who may have access to the entire dataset solely for auditing purposes). However, the model developers might be allowed to test their models for disparity by querying the compliance team for group fairness metrics. In this paper, we first demonstrate that simply querying for fairness metrics, such as statistical parity, can leak the protected attributes of individuals to the model developers. We demonstrate that there always exist strategies by which the model developers can identify the protected attribute of a targeted individual in the test dataset from just a single query. In particular, we show that one can reconstruct the protected attributes of all the individuals from O(Nk log(n /Nk)) queries when Nk<< n using techniques from compressed sensing (n is the size of the test dataset and Nk is the size of smallest group therein). Our results pose an interesting debate in algorithmic fairness: should querying for fairness metrics be viewed as a neutral-valued solution to ensure compliance with regulations? Or, does it constitute a violation of regulations and privacy if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals? To address this supposed violation of regulations and privacy, we also propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query function, outperforming naive techniques such as the Laplace mechanism. We also include experimental results on the Adult dataset and synthetic data (broad range of parameters). Faisal Hamman · Jiahao Chen · Sanghamitra Dutta 🔗 - I Prefer not to Say – Operationalizing Fair and User-guided Data Minimization (Poster) []     To grant users greater authority over their personal data, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle within these regulations is data minimization, which urges companies and institutions to only collect data that is relevant and adequate for the purpose of the data analysis. In this work, we take a user-centric perspective on this regulation, and let individual users decide which data they deem adequate and relevant to be processed by a machine-learned model. We require that users who decide to provide optional information should appropriately benefit from sharing their data, while users who rely on the mandate to leave their data undisclosed should not be penalized for doing so. This gives rise to the overlooked problem of fair treatment between individuals providing additional information and those choosing not to. While the classical fairness literature focuses on fair treatment between advantaged and disadvantaged groups, an initial look at this problem through the lens of classical fairness notions reveals that they are incompatible with these desiderata. We offer a solution to this problem by proposing the notion of Optional Feature Fairness (OFF) that follows from our requirements. To operationalize OFF, we derive a multi-model strategy and a tractable logistic regression model. We analyze the effect and the cost of applying OFF on several real-world data sets. Tobias Leemann · Martin Pawelczyk · Christian Eberle · Gjergji Kasneci 🔗 - Minimax Optimal Fair Regression under Linear Model (Poster) []     We investigate the minimax optimal error of a fair regression problem under a linear model employing demographic parity as a fairness constraint. As a tractable demographic parity constraint, we introduce $(\alpha,\delta)$-fairness consistency, meaning that the quantified unfairness is decreased at most $n^{-\alpha}$ rate with at least probability $1-\delta$, where $n$ is the sample size. In other words, the consistently fair algorithm eventually outputs a regressor satisfying the demographic parity constraint with high probability as $n$ tends to infinity. As a result of our analyses, we found that the minimax optimal error under the $(\alpha,\delta)$-fairness consistency constraint is $\Theta(\frac{dM}{n})$ provided that $\alpha \le \frac{1}{2}$, where $d$ is the dimensionality, and $M$ is the number of groups induced from the sensitive attributes. Kazuto Fukuchi · Jun Sakuma 🔗 - A Deep Dive into Dataset Imbalance and Bias in Face Identification (Poster) []     As the deployment of automated face recognition (FR) systems proliferates, bias in these systems is not just an academic question, but a matter of public concern. Media portrayals often center imbalance as the main source of bias, i.e., that FR models perform worse on images of non-white people or women because these demographic groups are underrepresented in training data. Recent academic research paints a more nuanced picture of this relationship. However, previous studies of data imbalance in FR have focused exclusively on the face verification setting, while the face identification setting has been largely ignored, despite being deployed in sensitive applications such as law enforcement. This is an unfortunate omission, as `imbalance' is a more complex matter in identification; imbalance may arise in not only the training data, but also the testing data, and furthermore may affect the proportion of identities belonging to each demographic group or the number of images belonging to each identity. In this work, we address this gap in the research by thoroughly exploring the effects of each kind of imbalance possible in face identification, and discuss other factors which may impact bias in this setting. Valeriia Cherepanova · Steven Reich · Samuel Dooley · Hossein Souri · John Dickerson · Micah Goldblum · Tom Goldstein 🔗 - Pragmatic Fairness: Optimizing Policies with Outcome Disparity Control (Poster) []     We propose a framework for learning single-stage optimized policies in a way that is fair with respect to membership in a sensitive group. Unlike the fair prediction setting, we attempt to design an unseen, future policy that can reduce disparities while maximizing utility. Unlike other fair policy works, we focus on a pragmatic view: we ask what is the best we can do with an action space available to us, and without relying on counterfactuals of the protected attributes, or planning on an idealized 'fair world'. Specifically, we examine two scenarios: when it is not possible or necessary to reduce historical disparities among groups, but we can maintain or avoid increasing them by introduction of a new policy; and when it is possible to reduce disparities while considering maximizing outcomes. We formulate controlling disparities in these two scenarios as avoiding difference of individual effects between a new and an old policy and as smoothing out differences of expected outcomes across a space of sensitive attributes. We propose two policy design methods that can leverage observational data using causal assumptions and illustrate their uses on experiments with semi-synthetic models. Limor Gultchin · Siyuan Guo · Alan Malek · Silvia Chiappa · Ricardo Silva 🔗 - Counterfactual Risk Assessments under Unmeasured Confounding (Poster) Statistical risk assessments inform consequential decisions such as pretrial release in criminal justice, and loan approvals in consumer finance.Such risk assessments make counterfactual predictions, predicting the likelihood of an outcome under a proposed decision (e.g., what would happen if we approved this loan?).A central challenge is that there may have been unobserved confounders that jointly affected past decisions and outcomes in the historical data. We propose a tractable mean outcome sensitivity model that bounds the extent to which unmeasured confounders could affect outcomes on average. The mean outcome sensitivity model partially identifies the conditional likelihood of the outcome under the proposed decision as well as popular predictive performance metrics (accuracy, calibration, TPR, FPR, etc.) and commonly-used predictive disparities, and we derive their sharp identified sets.We then solve three tasks that are essential to deploying statistical risk assessments in high-stakes settings.First, we propose a learning procedure based on doubly-robust pseudo-outcomes that estimates bounds on the conditional likelihood of the outcome under the proposed decision, and derive a bound on its integrated mean square error.Second, we show how our estimated bounds on the conditional likelihood of the outcome under the proposed decision can be translated into a robust, plug-in decision-making policy, and derive bounds on its worst-case regret relative to the max-min optimal decision rule.Third, we develop estimators of the bounds on the predictive performance metrics of existing risk assessment that are based on efficient influence functions and cross-fitting, and only require black-box access to the risk assessment. Our final task is to use the historical data to robustly audit or evaluate the predictive fairness properties of an existing risk assessment under the mean outcome sensitivity model. Amanda Coston · Edward Kennedy · Ashesh Rambachan 🔗 - Perception as a Fairness Parameter (Poster) Perception refers to the process by which two or more agents make sense of the same information differently. Cognitive psychologists have long studied it, and more recently, the artificial intelligence (AI) community has also studied it due to its role in biased decision-making. Largely unexplored within the Fair AI literature, in this work we consider perception as a parameter of interest for tackling fairness problems and present the fair causal perception (FCP) framework. FCP allows an algorithmic decision-maker h to elicit group-specific representations, or perceptions, centered on a discrete protected attribute A to improve the information set X used to calculate the decision outcome h(X) = Y. This framework combines ontologies and structural causal models, resulting in a perspective-based causal model. Under FCP, the algorithmic decision-maker h can choose to enhance X depending on its fairness goals by re-interpreting it under A-specific perceptions, which means that the same individual instance can be classified differently depending on the evoked representation. We showcase the framework with an example based on a college admissions problem using synthetic data where, in the case of a tie between similar candidates with different values for socioeconomic background A, h non-randomly breaks the tie in favor of the under-privileged candidate. Using FCP, we describe what it means to be an applicant from the under-privileged group and how it causally affects the observed X in this college admissions context; and, in turn, we also describe local penalties to be introduced by h when classifying these applicants. Benchmarking individual fairness metrics, we compare how h derives fairer outcomes under FCP. Jose Alvarez · Mayra Russo 🔗 - Causal Fairness for Affect Recognition (Poster) Though research in algorithmic fairness is rapidly expanding, most of the existing work are tailored towards and bench-marked against social datasets. There is limited work which takes into holistic account the specific challenges unique to affect recognition. We outline some key specific challenges unique to affective computing and highlight how existing causal fairness methods and mechanisms are insufficient to fully address them. Jiaee Cheong · Sinan Kalkan · Hatice Gunes 🔗 - Simple improvements for better measuring private model disparities (Poster) []  Empirical studies have recently established that training differentially private models (with DP-SGD) results in disparities between classes. These works follow methodology from \emph{public} models in computing per-class accuracy and then comparing the worst-off class accuracy with other groups or with the overall accuracy. However, DP-SGD adds additional noise during model training and results in models that vary in prediction output across epochs and runs. Thus, it is largely unclear how to measure disparities in private models in the presence of noise; particularly when classes are not independent. In this work, we run extensive experiments by training state-of-the-art private models with various levels of privacy and find that DP training tends to over- or under-predict specific classes, leading to large variations in disparities between classes. Judy Hanwen Shen · Soham De · Sam Smith · Jamie Hayes · Leonard Berrada · David Stutz · Borja De Balle Pigem 🔗 - Towards a genealogical approach to explaining algorithmic bias (Poster) []  Specifically in the FAccT literature, algorithmic bias tends to be characterized as a problem in its consequences rather than as evidence of the underlying societal and technical conditions that (re)produce it. In this context, explainability (XAI) tools are proposed as a solution to gauge these conditions (e.g. SHAP and LIME as well as libraries such as What If or IBM360). While relevant, these tools tend to approach these conditions unrealistically; as static, cumulative and in terms of their causal import. Differently, I here propose that these tools be informed by a genealogical approach to bias. Following the tradition of Nietzsche and Foucault, a genealogy is “a form of historical critique, designed to overturn our norms by revealing their origins” (Hill, 2016, p.1). In this case, I understand genealogy as a form of epistemic critique, designed to understand algorithmic bias in its consequences by focusing on the conditions for its possibility. In this respect, I propose to question XAI tools as much as to use them as questions, rather than as replies to the problem of bias as skewed performance. This work puts forward two proposals. First, we propose a framework to index XAI tools according to their relevance for bias as evidence. We identify feature importance methods (e.g. SHAP) and rule-list methods as relevant for procedural fairness, while we identify counterfactual methods as relevant to a) agency, in terms of suggesting what can be changed to affect an outcome and b) building a prima facie case for discrimination. Second, we propose a rubric of questions to test these tools in their abilities to detect so-called “bias-shifts”. Overall, the aim is to think about XAI approaches not as mere technical tools but as questions on skewed performance for evidence gathering with fairness implications. Marta Ziosi 🔗 - Addressing observational biases in algorithmic fairness assessments (Poster) []  The objective of this early work is to characterize the implications for model development and evaluation of observational biases with subgroup-dependent structure (including selection bias, measurement error, label bias, or censoring that differs across subgroups), extending work that aims to characterize and resolve conflicts among statistical fairness criteria in the absence of such biases. These biases pose challenges because naive approaches to model fitting produce statistically biased results, with potential fairness harms induced by systematic, differential misestimation across subgroups, and it is challenging, or impossible in some contexts, to detect such biases without additional data or domain knowledge. As an example, we present an illustrative case study in the setting with differential censoring across subgroups. Chirag Nagpal · Olawale Salaudeen · Sanmi Koyejo · Stephen Pfohl 🔗 - Parity in predictive performance is neither necessary nor sufficient for fairness (Poster) []  TL;DR: Parity in Predictive Performance (PPP) holds that a machine learning model is fair if and only if its predictive performance (by some measure) is (approximately) equal across groups of interest. We argue that this assumes that groups are equally difficult, which is unlikely to hold in practice. Absent this assumption, a model could be fair but not satisfy PPP, or be unfair yet satisfy PPP. Thus, PPP is neither necessary nor sufficient for fairness. We propose a new definition of fairness, Relative Realised PPP (R2P3), to account for these situations. Justin Engelmann · Miguel Bernabeu · Amos Storkey 🔗 - Caused by Race or Caused by Racism? Limitations in Envisioning Fair Counterfactuals (Poster) []  Causal modeling is often valued for its interpretability in attributing cause and defining counterfactuals. However, these framings are fundamentally ideological, and may not align with political or sociological understandings of structural inequality or actions of resistance by marginalized people. We outline high-level conceptual conflicts in statistically modeling causal effects of race and sociologically understanding causal effects of racism. By drawing upon Disability Studies, we trace the logic of counterfactuals in social movements and theories to demonstrate how complicating notions of constructed social groups give rise to to differing definitions of fairness. These different counterfactual perspectives create systematic differences in calculated causal quantities, leading to common-sense fairness shortfalls in cases of assimilation, e.g., racism driving forced proximity to whiteness. We advocate for creating a formalized approach to defining these alternative constructions of fairness, articulating political-sociological limitations in counterfactual interpretations, establishing evaluation criteria for conflict with social change movements, and exploring possible interfaces between causal models and other fairness definitions. Evan Dong 🔗 - Can Causal (or Counterfactual) Representations benefit from Quantum Computing? (Poster) Causal questions have often permeated through our daily lives in different industries such as Healthcare and Legal system. Causality refers to the study of causes and effects. Causality first emerged from the field of philosophy and has now even spread in common machine learning practices to adopt prior knowledge between different features Pearl [2009]. Although there have been papers on causal estimation and effects for quantum computing Barrett et al. [2019], in this abstract we hope to spark discussions around the opposing case, that is, how latent causal representations can possibly be modelled through quantum computing. Rakshit Naidu · Daniel Justice 🔗 - A Bayesian Causal Inference Approach for Assessing Fairness in Clinical Decision-Making (Poster) Fairness in clinical decision-making is a critical element of health equity, but assessing fairness of clinical decisions from observational data is challenging. Recently, many fairness notions have been proposed to quantify fairness in decision-making, among which causality-based fairness notions have gained increasing attention due to its potential in adjusting for confounding and reasoning about bias. However, causal fairness notions remain under-explored in the context of clinical decision-making with large-scale healthcare data. In this work, we propose a Bayesian causal inference approach for assessing a causal fairness notion called principal fairness in clinical settings. We demonstrate our approach using both simulated data and electronic health records (EHR) data. Linying Zhang · Lauren Richter · Yixin Wang · Anna Ostropolets · Noemie Elhadad · David Blei · George Hripcsak 🔗 - "You Can't Fix What You Can't Measure": Privately Measuring Demographic Performance Disparities in Federated Learning (Poster) []     As in traditional machine learning models, models trained with federated learning may exhibit disparate performance across demographic groups. Model holders must identify these disparities to mitigate undue harm to the groups. However, measuring the model's performance in a group requires access to information about group membership which, for privacy reasons, often has limited availability. We propose novel locally differentially private mechanisms to measure differences in performance across groups while protecting the privacy of group membership. To analyze the effectiveness of the mechanisms, we bound their error in estimating a disparity when optimized for a given privacy budget. Our results show that the error rapidly decreases for realistic numbers of participating clients, demonstrating that, contrary to what prior work suggested, protecting privacy is not necessarily in conflict with identifying performance disparities of federated models. Marc Juarez · Aleksandra Korolova 🔗 - Privacy Aware Experimentation over Sensitive Groups: A General Chi Square Approach (Poster) []     As companies work to provide the best possible experience for members, users, and customers, it is crucial to understand how different people – particularly individuals from sensitive groups - have different experiences. For example, do women visit our platform less frequently than members of other genders? Or perhaps, are people with disabilities disproportionately affected by a change to our user interface? However, to run these statistical tests or form estimates to answer these questions, we need to know sensitive attributes. When dealing with personal data, privacy techniques should be considered, especially when we are dealing with sensitive groups, e.g. race/ethnicity or gender. We study a new privacy model where users belong to certain sensitive groups, and we show how to conduct statistical inference on whether there are significant differences in outcomes between the various groups. We introduce a general chi-squared test that accounts for differential privacy in group membership, and show how this covers a broad set of hypothesis tests, improving statistical power over tests that ignore the noise due to privacy. Rina Friedberg · Ryan Rogers 🔗 - When Fairness Meets Privacy: Fair Classification with Semi-Private Sensitive Attributes (Poster) []     Machine learning models have demonstrated promising performances in many areas. However, the concerns that they can be biased against specific groups hinder their adoption in high-stake applications. Thus, it is essential to ensure fairness in machine learning models. Most of the previous efforts require access to sensitive attributes for mitigating bias. Nevertheless, it is often infeasible to obtain a large scale of data with sensitive attributes due to people's increasing awareness of privacy and the legal compliance. Therefore, an important research question is how to make fair predictions under privacy. In this paper, we study a novel problem of fair classification in a semi-private setting, where most of the sensitive attributes are private and only a small amount of clean ones are available. To this end, we propose a novel framework FairSP that can first learn to correct the noisy sensitive attributes under the privacy guarantee by exploiting the limited clean ones. Then, it jointly models the corrected and clean data in an adversarial way for debiasing and prediction. Theoretical analysis shows that the proposed model can ensure fairness when most sensitive attributes are private. Extensive experimental results in real-world datasets demonstrate the effectiveness of the proposed model for making fair predictions under privacy and maintaining high accuracy. Canyu Chen · Yueqing Liang · Xiongxiao Xu · Shangyu Xie · Yuan Hong · Kai Shu 🔗 - Fairness Certificates for Differentially Private Classification (Poster) []     In this work, we theoretically study the impact of differential privacy on fairness in binary classification. We prove that, given a class of models, popular group fairness measures are pointwise Lipschitz-continuous with respect to the parameters of the model. This result is a consequence of a more general statement on the probability that a decision function makes a negative prediction conditioned on an arbitrary event (such as membership to a sensitive group), which may be of independent interest. We use the aforementioned Lipschitz property to prove a high probability bound showing that, given enough examples, the fairness level of private models is close to the one of their non-private counterparts. Paul Mangold · Michaël Perrot · Marc Tommasi · Aurélien Bellet 🔗 - Conditional Demographic Parity Through Optimal Transport (Poster) []     In this paper, we propose a regularization-based approach to impose conditional demographic parity in supervised learning problems. While many methods exist to achieve demographic parity, conditional demographic parity can be much more challenging to achieve, particularly when the conditioning variables are continuous or discrete with many levels. Our regularization approach is based on a probability distribution distance called bi-causal transport distance proposed in the Optimal Transport literature. Our method utilizes a single regularization term whose computational cost is $O(n^2)$ in the sample size, regardless of the dimension of the conditioning variables or whether those variables are continuous or discrete. We also target full independence of the conditional distributions, rather than only targeting the first moments like many existing methods for demographic parity. We validate the efficacy of our approach using experiments on real-world and synthetic datasets. Luhao Zhang · Mohsen Ghassemi · Ivan Brugere · Niccolo Dalmasso · Alan Mishler · Vamsi Potluru · Tucker Balch · Manuela Veloso 🔗 - Adjusting the Gender Wage Gap with a Low-Dimensional Representation of Job History (Poster) []     To estimate adjusted gender wage gaps, economists build models that predict wage from observable data. Although an individual's complete job history may be predictive of wage, economists typically summarize experience with hand-constructed summary statistics about the past. In this work, we estimate the adjusted gender wage gap for an individual's entire job history by learning a low-dimensional representation of their career. We develop a transformer-based representation model that is pretrained on massive, passively-collected resume data that is then fine-tuned to predict wages on the small, nationally representative survey data that economists use for wage gap estimation. This dimension-reduction approach produces unbiased estimates of the adjusted wage gap as long as each representation corresponds to the same full job history for males and females. We discuss how this condition relates to the sufficiency fairness criterion; although the adjusted wage gap is not a causal quantity, we take inspiration from the high-dimensional confounding literature to assess and mitigate sufficiency. We validate our approach with experiments on semi-synthetic and real-world data. Our method makes more accurate wage predictions than economic baselines. When applied to wage survey data in the United States, our method finds that a substantial portion of the gender wage gap can be attributed to differences in job history, although this proportion varies by year and across sub-populations. Keyon Vafa · Susan Athey · David Blei 🔗 - Privacy-Preserving Group Fairness in Cross-Device Federated Learning (Poster) []     Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not giving access to the clients’ data. As we show in this paper, this conflict between fairness and privacy in FL can be resolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). In doing so, we propose a method for training group-fair ML models in cross device FL under complete and formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values. Empirical evaluations on real world datasets demonstrate the effectiveness of our solution to train fair and accurate ML models in federated cross-device setups with privacy guarantees to the users. Sikha Pentyala · Nicola Neophytou · Anderson Nascimento · Martine De Cock · Golnoosh Farnadi 🔗 - Predictive Multiplicity in Probabilistic Classification (Poster) []     There may exist multiple models that perform almost equally well for any given prediction task. Related to the role of counterfactuals in studies of discrimination, we examine how individual predictions vary among these alternative competing models. In particular, we study predictive multiplicity -- in probabilistic classification. We formally define measures for our setting and develop optimization-based methods to compute these measures. We demonstrate how multiplicity can disproportionately impact marginalized individuals. And we apply our methodology to gain insight into why predictive multiplicity arises. Given our results, future work could explore how multiplicity relates to causal fairness. Jamelle Watson-Daniels · David Parkes · Berk Ustun 🔗