The business of supplying labeled data for building AI systems is a global industry. But the people who do the labeling face challenges that impinge on the quality of both their work and their lives.
What’s new: The Verge interviewed more than two dozen data annotators, revealing a difficult, precarious gig economy. Workers often find themselves jaded by low pay, uncertain schedules, escalating complexity, and deep secrecy about what they’re doing and why.
How it works: Companies that provide labeling services including Centaur Labs, Surge AI, and Remotasks (a division of data supplier Scale AI) use automated systems to manage gig workers worldwide. Workers undergo qualification exams, training, and performance monitoring to perform tasks like drawing bounding boxes, classifying sentiments expressed by social media posts, evaluating video clips for sexual content, sorting credit-card transactions, rating chatbot responses, and uploading selfies of various facial expressions.
- The pay scale varies widely, depending on the worker’s location and the task assigned, from $1 per hour in Kenya to $25 per hour or more in the U.S. Some tasks that require specialized knowledge, sound judgment, and/or extensive labor can pay up to $300 per task.
- To protect their clients’ trade secrets, employers dole out assignments without identifying the client, application, or function. Workers don’t know the purpose of the labels they’re called upon to produce, and they’re warned against talking about their work.
- The assignments often begin with ambiguous instructions. They may call for, say, labeling actual clothing that might be worn by a human being, so clothes in a photo of a toy doll or a cartoon drawing clearly don’t qualify. But do images of clothing reflected in a mirror? And does a suit of armor count as clothing? How about swimming fins? As developers iterate on their models, rules that govern how the data should be labeled become more elaborate, forcing labelers to keep in mind a growing variety of exceptions and special cases. Workers who make too many mistakes may lose the gig.
- Work schedules are sporadic and unpredictable. Workers don’t know when the next assignment will arise or how long it will last, whether the next gig will be interesting or soul-crushing, or whether it will pay well or poorly. Such uncertainty — and differential between their wages and their employers’ revenue as reported in the press — can leave workers demoralized.
- Many labelers manage the stress by gathering in clandestine groups on WhatsApp to share information and seek advice about how to find good gigs and avoid undesirable ones. There, they learn tricks like using existing AI models to do the work, connecting through proxy servers to disguise their locations and maintaining multiple accounts as a hedge against suspension for getting caught breaking rules.
What they’re saying: “AI doesn’t replace work. But it does change how work is organized.” —Erik Duhaime, CEO, Centaur Labs
Behind the news: Stanford computer scientist Fei-Fei Li was an early pioneer in crowdsourcing data annotations. In 2007, she led a team at Princeton to scale the number of images used to train an image recognizer from tens of thousands to millions. To get the work done, the team hired thousands of workers via Amazon’s Mechanical Turk platform. The result was ImageNet, a key computer vision dataset.
Why it matters: Developing high-performance AI systems depends on accurately annotated data. Yet the harsh economics of annotating at scale encourages service providers to automate the work and workers to either cut corners or drop out. Notwithstanding recent improvements — for instance, Google raised its base wage for contractors who evaluate search results and ads to $15 per hour — everyone would benefit from treating data annotation less like gig work and more like a profession.
We’re thinking: The value of skilled annotators becomes even more apparent as AI practitioners adopt data-centric development practices that make it possible to build effective systems with relatively few examples. With far fewer examples, selecting and annotating them properly is absolutely critical.