A German court dismissed a copyright lawsuit against LAION, the nonprofit responsible for large-scale image datasets used to train Midjourney, Stable Diffusion, and other image generators.
What’s new: The court rejected a lawsuit claiming that cataloging images on the web to train machine learning models violates the image owners’ copyrights. It ruled that LAION’s activities fall under protections for scientific research.
How it works: LAION doesn’t distribute images. Instead, it compiles links to images and related text that are published on publicly available websites. Model builders who wish to use the images and/or text must download them from those sources. In 2023, photographer Robert Kneschke sued LAION for including his photos. The court’s decision emphasized several key points.
- LAION, while compiling links to images, had indeed made unauthorized copies of images protected by copyright, as defined by German law. However, Germany’s Copyright Act allows unauthorized use of copyrighted works for scientific research. The court ruled that LAION had collected the material for this purpose, so it did not violate copyrights.
- Moreover, the court found that downloading images and text in order to correlate them likely fell under a further exemption to copyright for data mining. This finding wasn’t definitive because the exemption for research made it irrelevant, but the court mentioned it to help guide future rulings.
- The dataset’s noncommercial status was a key factor in the ruling. LAION distributed the dataset for free, and no commercial entity controlled its operations. Although a LAION dataset may be used to train a machine learning model that’s intended to be sold commercially, this is not sufficient to classify creating such datasets as commercial activity. The plaintiff contended that, because some LAION members have paid roles in commercial companies, LAION could be considered a commercial entity. However, the court rejected that argument.
Behind the news: Several other artists have sued LAION, which stands for Large-scale AI Open Network, claiming that the organization used their works without their consent. They have also sued AI companies, including a class action suit against Stability AI, Midjourney, and DeviantArt for using materials under copyright, including images in LAION’s datasets, to train their models. Similar cases have been brought against makers of music generators and coding assistants. All these lawsuits, which are in progress, rest on the plaintiff’s claim that assembling a training dataset of copyrighted works infringes copyrights.
Why it matters: The German ruling is the first AI-related decision in Europe since the adoption of the AI Act, and the court took that law’s intent into account when making its decision. It affirms that creating text-image pairs of publicly available material for the purpose of training machine learning models does not violate copyrights, even if commercial organizations later use the data. However, the court did not address whether training AI models on such datasets, or using the trained models in a commercial setting, violates copyrights.
We’re thinking: This decision is encouraging news for AI researchers. We hope jurisdictions worldwide establish that training models on media that’s available on the open web is fair and legal.