Workshop
Algorithmic Fairness through the lens of Metrics and Evaluation
Awa Dieng · Miriam Rateike · Jamelle Watson-Daniels · Golnoosh Farnadi · Nando Fioretto
West Meeting Room 111, 112
Sat 14 Dec, 9 a.m. PST
We are proposing the Algorithmic Fairness through the lens of Metrics and Evaluation (AFME)workshop, which is the fifth edition of this workshop series on algorithmic fairness. While previouseditions have explored foundational work on causal approaches to fairness and the intersection offairness with other fields of trustworthy machine learning namely interpretability, robustness, privacyand temporal aspects, this year’s workshop aims to timely reflect on fairness metrics definitions andevaluation methods.Indeed, with rapid advances in large generative models and international regulatory efforts as well aspertinent calls to understand fairness in context, it is crucial to revisit the suitability of existing fairnessmetrics and explore new bias evaluation frameworks. Our workshop aims to provide a venue to haverigorous interdisciplinary discussions around these critical topics and foster reflections on the necessityand challenges in defining adaptable fairness metrics and designing reliable evaluation techniques.## TopicThe discussion on defining and measuring algorithmic (un)fairness has predominantly been afocus in the early stages of algorithmic fairness research [Dwork et al., 2012, Zemel et al., 2013, Hardtet al., 2016, Zafar et al., 2017, Agarwal et al., 2018] resulting in four main fairness denominations:individual or group [Binns, 2020], statistical or causal [Makhlouf et al., 2023], equalizing or non-equalizing [Diana et al., 2021], and temporal or non-temporal fairness [Rateike, 2024]. Since, muchwork in the field had been dedicated to providing methodological advances within each denominationand understanding various trade-offs between fairness metrics [Binns, 2020, Heidari et al., 2019,Kleinberg et al., 2017]. However, given the changing machine learning landscape, with both increasingglobal applications and the emergence of large generative models, the question of understanding anddefining what constitutes “fairness” in these systems has become paramount again.On one hand, definitions of algorithmic fairness are being critically examined regarding the historicaland cultural values they encode [Asiedu et al., 2024, Arora et al., 2023, Bhatt et al., 2022]. Themathematical conceptualization of these definitions and their operationalization through satisfyingstatistical parities has also raised criticism of not taking into account the context within which thesesystems are deployed [Weinberg, 2022, Green and Hu, 2018].On another hand, it is still unclear how to reconcile standard fairness metrics and evaluationsdeveloped mainly for prediction and classification tasks with large generative models. While someworks proposed adapting existing fairness metrics, e.g., to large language models [Li et al., 2023,Zhang et al., 2023, Gallegos et al., 2023], questions remain on how to systematically measure fairnessfor textual outputs, or even multi-modal generative models [Schmitz et al., 2022, Chen et al., 2023,Lum et al., 2024]. Large generative models also pose new challenges to fairness evaluation withrecent work showcasing how biases towards specific tokens in large language models can influencefairness assessments during evaluation [Ding et al., 2024]. Finally, regulatory requirements introducenew challenges in defining, selecting, and assessing algorithmic fairness [Deck et al., 2024, Lauxet al., 2024, Hellman, 2020].Given these critical and timely considerations, this workshop aims to investigate how to defineand evaluate (un)fairness in today’s machine learning landscape. We are particularly interested inaddressing open questions in the field, such as:• Through a retrospective lens, what are the strengths and limitations of existing fairnessmetrics?• How to operationalize contextual definitions of fairness in diverse deployment domains?• Given the plethora of use-cases, how to systematically evaluate fairness and bias in largegenerative models?• How do recent regulatory efforts demand the utilization of fairness metrics and evaluationtechniques, and do existing ones comply with regulations?
Live content is unavailable. Log in and register to view live content