We know that image generators create wonderful original works, but do they sometimes replicate their training data? Recent work found that replication does occur.
What's new: Gowthami Somepalli and colleagues at University of Maryland devised a method that spots instances of image generators copying from their training sets, from entire images to isolated objects, with minor variations.
Key insight: A common way to detect similarity between images is to produce embeddings of them and compute the dot product between embeddings. High dot product values indicate similar images. However, while this method detects large-scale similarities, it can fail to detect local ones. To detect a small area shared by two images, one strategy is to split apart their embeddings, compute the dot product between the pieces, and look for high values.
How it works: The authors (i) trained image generators, (ii) generated images, and (iii) produced embeddings of those images as well as the training sets. They (iv) broke the embeddings into chunks and (v) detected duplications by comparing embeddings of the generated images with those of the training images.
- First the authors looked for models whose embeddings were effective in detecting replications. They tested 10 pretrained computer vision architectures on a group of five datasets for image retrieval — a task selected because the training sets include replications — and five synthetic datasets that contain replications. The three models whose embeddings revealed duplications most effectively were Swin, DINO, and SSCD, all of which were pretrained on ImageNet.
- Next they generated images. They trained a diffusion model on images drawn from datasets of flowers and faces. They trained the model on subsets of varying sizes: smaller (100 to 300 examples), medium (roughly 1,000 to 3,000), and larger (around 8,200).
- Swin, DINO, and SSCD produced embeddings of the images in the training set and generated images. The authors split these embeddings into many smaller, evenly sized chunks. To calculate the similarity scores, they computed the dot product between corresponding pairs of chunks (that is, the nth chunk representing a training image and the nth chunk representing a generated image). The score was the maximum value of the dot products.
- To test their method under conditions closer to real-world use, the authors performed similar experiments on a pretrained Stable Diffusion. They generated 9,000 images from 9,000 captions chosen at random from the Aesthetics subset of LAION. They produced embeddings of the generated images and the LAION Aesthetics. They split these embeddings and compared their dot products.
Results: For each generated image, the authors found the 20 most similar images in the training set (that is, those whose fragmented embeddings yielded the highest dot products). Inspecting those images, they determined that the diffusion model sometimes copied elements from the training set. They plotted histograms of the similarity between images within a training set and the similarity between training images and generated images. The more the two histograms overlapped, the fewer the replications they expected to find. Both histograms and visual inspection indicated that models trained on smaller datasets contained more replications. However, on tests with Stable Diffusion, 1.88 percent of generated images had a similarity score greater than 0.5. Above that threshold, the authors observed obvious replications — despite that model’s pretraining on a large dataset.
Why it matters: Does training an image generator on artworks without permission from the copyright holder violate the copyright? If the image generator literally copies the work, then the answer would seem to be “yes.” Such issues are being tested in court. This work moves the discussion forward by proposing a more sensitive measure of similarity between training and generated images.
We're thinking: Picasso allegedly said that good artists borrow while great artists steal. . . .