Timezone: »
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.
Author Information
Tolga Bolukbasi (Boston University)
Kai-Wei Chang (UCLA)
James Y Zou (Microsoft Research)
Venkatesh Saligrama (Boston University)
Adam T Kalai (Microsoft Research)
Adam Tauman Kalai received his BA (1996) from Harvard, and MA (1998) and PhD (2001) under the supervision of Avrim Blum from CMU. After an NSF postdoctoral fellowship at M.I.T. with Santosh Vempala, he served as an assistant professor at the Toyota Technological institute at Chicago and then at Georgia Tech. He is now a Senior Research Scientist at Microsoft Research New England. His honors include an NSF CAREER award, and an Alfred P. Sloan fellowship. His research focuses on computational learning theory, game theory, algorithms, and online optimization.
More from the Same Authors
-
2021 Spotlight: Online Selective Classification with Limited Feedback »
Aditya Gangrade · Anil Kag · Ashok Cutkosky · Venkatesh Saligrama -
2021 : Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency »
Samarth Mishra · Kate Saenko · Venkatesh Saligrama -
2022 : Group Excess Risk Bound of Overparameterized Linear Regression with Constant-Stepsize SGD »
Arjun Subramonian · Levent Sagun · Kai-Wei Chang · Yizhou Sun -
2022 : Empowering Language Models with Knowledge Graph Reasoning for Question Answering »
Ziniu Hu · Yichong Xu · Wenhao Yu · Shuohang Wang · Ziyi Yang · Chenguang Zhu · Kai-Wei Chang · Yizhou Sun -
2022 Poster: On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs »
Arjun Subramonian · Kai-Wei Chang · Yizhou Sun -
2022 Poster: Semantic Probabilistic Layers for Neuro-Symbolic Learning »
Kareem Ahmed · Stefano Teso · Kai-Wei Chang · Guy Van den Broeck · Antonio Vergari -
2022 Poster: Controllable Text Generation with Neurally-Decomposed Oracle »
Tao Meng · Sidi Lu · Nanyun Peng · Kai-Wei Chang -
2022 Poster: How Transferable are Video Representations Based on Synthetic Data? »
Yo-whan Kim · Samarth Mishra · SouYoung Jin · Rameswar Panda · Hilde Kuehne · Leonid Karlinsky · Venkatesh Saligrama · Kate Saenko · Aude Oliva · Rogerio Feris -
2022 Poster: Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering »
Pan Lu · Swaroop Mishra · Tanglin Xia · Liang Qiu · Kai-Wei Chang · Song-Chun Zhu · Oyvind Tafjord · Peter Clark · Ashwin Kalyan -
2021 Poster: Online Selective Classification with Limited Feedback »
Aditya Gangrade · Anil Kag · Ashok Cutkosky · Venkatesh Saligrama -
2021 Poster: Bandit Quickest Changepoint Detection »
Aditya Gopalan · Braghadeesh Lakshminarayanan · Venkatesh Saligrama -
2020 Poster: Learning to Approximate a Bregman Divergence »
Ali Siahkamari · XIDE XIA · Venkatesh Saligrama · David Castañón · Brian Kulis -
2020 Poster: Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond »
Kaidi Xu · Zhouxing Shi · Huan Zhang · Yihan Wang · Kai-Wei Chang · Minlie Huang · Bhavya Kailkhura · Xue Lin · Cho-Jui Hsieh -
2020 Poster: Online Algorithm for Unsupervised Sequential Selection with Contextual Information »
Arun Verma · Manjesh Kumar Hanawal · Csaba Szepesvari · Venkatesh Saligrama -
2020 Poster: Limits on Testing Structural Changes in Ising Models »
Aditya Gangrade · Bobak Nazer · Venkatesh Saligrama -
2019 Poster: Efficient Near-Optimal Testing of Community Changes in Balanced Stochastic Block Models »
Aditya Gangrade · Praveen Venkatesh · Bobak Nazer · Venkatesh Saligrama -
2019 Poster: Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices »
Don Dennis · Durmus Alp Emre Acar · Vikram Mandikal · Vinu Sankar Sadasivan · Venkatesh Saligrama · Harsha Vardhan Simhadri · Prateek Jain -
2017 Poster: Adaptive Classification for Prediction Under a Budget »
Feng Nan · Venkatesh Saligrama -
2016 Workshop: Machine Learning in Computational Biology »
Gerald Quon · Sara Mostafavi · James Y Zou · Barbara Engelhardt · Oliver Stegle · Nicolo Fusi -
2016 Poster: Pruning Random Forests for Prediction on a Budget »
Feng Nan · Joseph Wang · Venkatesh Saligrama -
2016 Poster: A Credit Assignment Compiler for Joint Prediction »
Kai-Wei Chang · He He · Stephane Ross · Hal Daumé III · John Langford -
2015 : Discovering Salient Features via Adaptively Chosen Comparisons »
James Y Zou -
2015 Poster: Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction »
Joseph Wang · Kirill Trapeznikov · Venkatesh Saligrama -
2014 Poster: Efficient Minimax Signal Detection on Graphs »
Jing Qian · Venkatesh Saligrama -
2013 Poster: Contrastive Learning Using Spectral Methods »
James Y Zou · Daniel Hsu · David Parkes · Ryan Adams -
2012 Poster: Local Supervised Learning through Space Partitioning »
Joseph Wang · Venkatesh Saligrama -
2012 Poster: Priors for Diversity in Generative Latent Variable Models »
James Y Zou · Ryan Adams -
2010 Poster: Probabilistic Belief Revision with Structural Constraints »
Peter B Jones · Venkatesh Saligrama · Sanjoy K Mitter -
2009 Poster: Anomaly Detection with Score functions based on Nearest Neighbor Graphs »
Manqi Zhao · Venkatesh Saligrama -
2009 Spotlight: Anomaly Detection with Score functions based on Nearest Neighbor Graphs »
Manqi Zhao · Venkatesh Saligrama -
2008 Tutorial: Agnostic Learning: Algorithms and Theory »
Adam T Kalai -
2006 Poster: An Approach to Bounded Rationality »
Eli Ben-Sasson · Adam T Kalai · Ehud Kalai -
2006 Talk: An Approach to Bounded Rationality »
Eli Ben-Sasson · Adam T Kalai · Ehud Kalai