How can we build large-scale language and vision models that don’t inherit social biases? Conventional wisdom suggests training on larger datasets, but research challenges this assumption.
What’s new: Abeba Birhane at Trinity College Dublin, a colleague at Michigan State University, and two independent researchers analyzed publicly available text-image datasets for their proportion of hateful content (that is, content that belittles based on race or gender) and audited models trained on them for racial bias. They found that larger training sets can push models toward greater bias.
Key insight: The largest available datasets of text and images are collected indiscriminately, with little curation after the fact. Removing objectionable material from such immense corpora is challenging. Researchers often rely on automatic filters like the CLIP similarity between images and text to filter out bad data. To create larger datasets, they often relax those filters. Consequently, larger datasets can harbor a higher proportion of objectionable material than smaller datasets, and training on them could yield models whose performance is more biased.
How it works: The authors compared hateful language in LAION 400M, which comprises 400 million image-text pairs scraped from the web, to similar data in LAION 2B-en, which includes 2 billion image-text pairs also scraped from the web. They also analyzed racial biases present in models trained on both datasets.
- To identify hateful language, the authors ran pysentimiento, a Python library for sentiment analysis, on the text of each text-image example to find the probability that it belonged to one of three categories: hateful, targeted (that is, hateful and aimed at a specific person or group), or aggressive. They assessed each dataset according to its Hate Content Rate (HCR), the proportion of examples whose probability of being hateful, targeted, or aggressive surpassed a threshold value.
- To compare racial bias, they trained identical OpenCLIP architectures on each dataset. Then they used the models to classify headshots of nearly 600 individuals along with their self-identified race and gender as eight classes that included “human being,” “gorilla,” “suspicious person,” and “criminal.” They evaluated the models’ bias based on the percentage of faces associated with a given race and gender they classified with a label other than “human being.”
Results: The authors found a statistically-significantly lower proportion of hateful content in the smaller dataset. LAION-400M’s HCR in the “hateful” category was up to 0.1 percent lower relative to LAION-2B. The probability that a model would classify a face as “human being” fell from 18.6 percent for OpenCLIP-400M to 9.4 percent for OpenCLIP-2B, and the probabilities of classification as “criminal” and “suspicious person” rose. OpenCLIP-400M classified a portrait of a black man as a criminal 14 percent of the time, while OpenCLIP-2B did so 77.4 percent of the time. Despite the increase in biased classifications, OpenCLIP-2B achieved 1.5 percent higher accuracy on ImageNet.
Why it matters: Increasing numbers of open source models and consumer-facing products are trained on large, web-scraped datasets. For example, Stable Diffusion was trained largely on the 5B version of LAION. This work throws up a red flag for machine learning practitioners to consider the bias such training can impart, the harm such models might do, and the methods used to collect and curate large datasets.
We’re thinking: This work goes to show that data-centric AI is applicable even to the largest datasets. It's easier to focus on higher-quality data sources when collecting 400 million examples than 2 billion examples.