Published
Reading time
3 min read
Sasha Luccioni: Respect for Human Creativity and Agency

Before this past year, when I told people I worked in AI, more often than not I was met with a blank stare and sometimes a question along the lines of: “You mean like robots?” In the last year, the seemingly magical abilities of AI models, especially large language models (LLMs), have broken into mainstream awareness, and now I’m greeted with questions like: “How does ChatGPT really work?” But if we were more transparent about the sheer amount of human time and labor that went into training LLMs, I’m sure the questions would be more along the lines of: “How do I keep my data from being used for training AI models?” Because as impressive as ChatGPT’s knock-knock jokes or chocolate chip cookie recipes are, they are definitely not magical — they are built upon the work and creativity of human beings, who should be attributed for their contributions. 

AI models are black boxes that, to a user, appear to save labor. But, in fact, huge amounts of labor are required to develop them: from the books, websites, drawings, photos, and videos hoovered up without consent to the invisible armies of underpaid workers who spend their days ranking and improving LLM outputs. And all of this training is powered by massive amounts of natural resources that are extracted by still more human labor: rare metals to make those precious GPUs, water to cool them, energy to make them crunch numbers and output probabilities. 

Until very recently, issues of copyright and consent were overlooked when it came to AI training data. Existing laws were assumed not to apply to training AI models, and the “move fast and break things” motto prevailed. But in the past year, authors like Sarah Silverman and George R.R. Martin have sued AI companies to assert their rights as content creators whose work was used without their permission to train AI models. While it’s too early to say how these lawsuits (and others) will pan out and how that will shape the future of copyright law in the United States and beyond, I hope that new mechanisms will be developed to allow content creators more control over their work. We are starting to see this from organizations like Spawning, which helped create ai.txt files that restrict the use of content for commercial AI training. I hope to see more AI developers respect these mechanisms and adopt opt-in (as opposed to opt-out) approaches for gathering consent-based datasets.

Apart from training data, development itself requires increasing amounts of labor. A new step recently has been added to the training process: RLHF, or reinforcement learning from human feedback. This step employs human annotators to rank text generated by large language models, providing feedback that makes them better at responding to human instructions and less likely to produce toxic output. This ranking process is done at scale by outsourced workers in offices in Kenya and prisons in Finland. Some of these workers are paid less than $2 an hour to label texts for hours on end, although we don’t have the overall numbers because AI companies are increasingly opaque about how they train AI models. Creating data for AI has become a new gig economy — but all this immense amount of human labor and creativity remains largely unseen and unrecognized.

And as AI is increasingly pushing out the very designers and artists whose life’s work was used to train the models in the first place (why pay a photographer when you can use AI to generate a custom stock photograph on demand), it’s crucial that we stop and reflect upon the relationship between human labor and creativity and AI. AI is truly an exciting new technology, and one that is set to provide huge profits to many tech companies, but artists and gig workers are barely getting crumbs of the pie, if anything at all. It’s not too late to reimagine AI as a technology that respects human agency and creativity by properly recognizing the human time and effort that goes into training AI models.

My hope in 2024 is that we start recognizing the knowledge, wisdom, and creativity that goes into training AI models, being more transparent about AI’s human costs, and developing increasingly human-centric technologies. 

Sasha Luccioni is a research scientist and climate lead at HuggingFace, a founding member of Climate Change AI, and a board member of Women in Machine Learning.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox