Models that generate text and images are raising thorny questions about the ownership of both their training data and their output.
What’s new: The companies that provide popular tools for generating text and images are fighting a barrage of lawsuits. TechCrunch surveyed the docket.
Legal actions: Three lawsuits are in progress:
- A group of artists filed a class-action lawsuit in a United States court against Stability AI and Midjourney, companies that provide image generators, and DeviantArt, an online community that hosts its own image generator. The lawsuit claims that the models’ ability to generate work “in the style of” a given artist infringes artists’ intellectual property rights and harms them financially.
- In a separate action, writer, programmer, and lawyer Matthew Butterick brought a class-action claim against Microsoft, OpenAI, and GitHub in a U.S. court. The plaintiff alleges that Copilot, a model that generates computer code, outputs open-source code without properly crediting its creators. Butterick is represented by the same lawyers who represent the artists who sued Stability AI, Midjourney, and DeviantArt.
- Getty Images announced its intent to sue Stability AI in a British court for using images scraped from Getty’s collection to train its models.
Defense measures: Companies are taking steps to protect themselves from legal risk.
- OpenAI asserted in a court filing that its use of open source code to train Copilot is protected by the U.S. doctrine of fair use, which allows limited reproduction of copyrighted materials for commentary, criticism, news reporting, and scholarly reports. Stability has claimed the same in the press. In 2015, a U.S. court ruled Google’s effort to digitally scan books was fair use.
- Stability AI plans to allow artists to opt out of inclusion in the dataset used to train the next version of Stable Diffusion.
- Github added a filter to Copilot that checks the program’s output against Github’s public code repository and hides output that’s too similar to existing code.
Why it matters: Companies that aim to capitalize on AI’s ability to generate text, images, code, and more raised tens of millions of dollars in 2022. Much of that value could evaporate if courts decide they must compensate sources of training data or scrap models trained using data that was obtained inappropriately.
We’re thinking: Laws that protect intellectual property haven’t yet caught up with AI. Without legal clarity, engineers have less freedom to innovate, and investors have less certainty about which approaches to support.