AI learns human biases: In word vector space, “man is to computer programmer as woman is to homemaker,” as one paper put it. New research helps language models unlearn such prejudices.
What’s new: Double-Hard Debias improves on a previous algorithm to mitigate gender bias in trained language generators. Tianlu Wang developed the method with researchers at the University of Virginia and Salesforce.
Key insight: The earlier Hard Debias works by identifying a masculine-to-feminine dimension in word vectors. Words that don’t have gender-specific meanings and, in popular word embeddings, fall at either end of this axis (such as doctor and nurse) are considered biased. Hard Debias compensates by shrinking the vector’s magnitude in this dimension. However, other work shows the relative frequency of words in various contexts distorts the feature space. For instance, grandfather appears as a genderless verb in legal discussions, where it means “to exempt,” while grandmother doesn’t, and that difference deforms grandfather’s gender dimension. Removing the dimension that encodes such alternative uses should make Hard Debias more effective.
How it works: Double-Hard Debias removes this frequency-related dimension before adjusting for gender bias. (It doesn’t affect the processing of inherently gendered words identified by the researchers, such as he and she.) The researchers applied their method to several models that extract word embeddings including the popular GloVe.
- Double Hard Debias first identifies the most gender-biased words: those whose gender dimension falls farthest from the mean.
- It finds the dimensions that capture the most variability. These dimensions are most likely to distort the gender axis and therefore candidates for removal.
- It selects the candidate dimension with the most impact on gender by determining the effect of removing it on the gender-bias dimension of the words identified in the first step.
- Then it removes the selected frequency dimension from all word vectors.
- Finally, the original Hard Debias algorithm recalculates the gender dimension of the revised word vectors.
Results: The researchers applied Double-Hard Debias and Hard Debias to separate models. They trained the models on two data subsets drawn from the OntoNotes corpus of informal speech. One was made up of biased statements (say, pairing doctor with he). The other comprised anti-biased statements (for instance, pairing doctor with she). Then they asked the models who he and she referred to. The difference in the Hard Debias model’s F1 scores when tested on the biased and unbiased data was 19.7. The difference in the Double Hard Debias model’s F1 scores was 7.7, showing that gender had a far smaller impact on its performance in the task.
Why it matters: Bias in machine learning is a serious problem. A medical language model that assumes all doctors are male and all nurses female could make serious mistakes when reading medical reports. Similarly, a legal platform that equates sexual assault victim with female could lead to unjust outcomes. Solutions like this are crucial stopgaps on the way to developing less biased datasets. The model’s authors told The Batch that Double Hard Debias could be applied towards other types of bias, too.
We’re thinking: If you’re building an NLP system, often bias won’t affect metrics like relevance or BLEURT results. But it’s important to attend to it anyway, because bias can have a significant unforeseen impact on users. We need the whole AI community to work hard to reduce undesirable biases wherever possible.