Will biases in training data unwittingly turn AI into a tool for persecution?
The fear: Bias encoded in software used by nominally objective institutions like, say, the justice or education systems will become impossible to root out. Result: injustice baked into the very institutions we count on to maintain a fair society.
What could go wrong: AI learns from data to reach its own conclusions. But training datasets are often gathered from and curated by humans who have social biases. The risk that AI will reinforce existing social biases is rising as the technology increasingly governs education, employment, loan applications, legal representation, and press coverage.
Behind the worries: Bias in AI is already making headlines.
- Models used by healthcare providers to assign care for 100 million patients suffering from chronic ailments like heart disease and diabetes underestimated how urgently black patients needed care, allowing white patients to receive critical care first.
- Amazon developed an AI tool to find the best candidates among job applicants. The company abandoned it after an in-house audit found that it rated male applicants much higher than female.
- Machine learning doesn’t only absorb biases encoded in data, it amplifies them. In the paper “Men Also Like Shopping,” researchers noted that an image classification model identified the subjects in 84 percent of photos of people cooking as women, even though only 66 of the images actually contained women. Word embeddings used by the model over-associated the act of cooking with female subjects.
How scared should you be: Until companies announce that they train their models on certified bias-free datasets as loudly as they trumpet machine-learning buzzwords, or until such systems pass a third-party audit, it’s a good bet their technology unfairly advantages some people over others.
What to do: In a 2018 keynote, researcher Rachel Thomas explains how machine learning engineers can guard against bias at each step of the development process. She recommends that every dataset come with a sheet describing how the set was compiled and any legal or ethical concerns that occurred to those who assembled it. She also suggests that teams include people from various backgrounds who may be alert to different sorts of bias.