Which dataset was used to train a given model? A new method makes it possible to see traces of the training corpus in a model’s output.
What’s new: Alexandre Sablayrolles and colleagues at Facebook and France’s National Institute for Research in Computer Science and Automation adulterated training data with imperceptible signals. Decisions made by models trained on this so-called radioactive data showed signs of the altered corpus.
Key insight: Changes in training data can affect the loss in a trained model’s decisions. A small, consistent alteration in the training data affects the loss in a predictable way.
How it works: The researchers replaced a portion of images in a training corpus with marked images. After training a model on the whole dataset, they compared the model’s loss on small subsets of marked and unmarked images.
- Consider a two-dimensional classification task, as illustrated above. Displacing the features of examples in one class by a constant amount shifts the decision boundary. This change acts as a fingerprint for the altered dataset.
- Radioactive data extends this intuition to higher dimensions. The algorithm randomly chooses a direction to shift extracted features of each class. Then it learns how to modify input images most efficiently to produce the same shifts.
- There are several ways to identify training on radioactive data, depending on the model. A simple one is to compare the model’s loss for a given class on radioactive and unaltered data. A model trained on radioactive data has a lower loss value on radioactive images because the model recognizes the added structure.
Results: The researchers marked 1 percent of Imagenet and trained a Resnet-18 on the entire dataset. The model’s loss on subsets of radioactive and normal data differed by a statistically significant amount, confirming that the model had been trained on a portion of marked data, while accuracy declined by only 0.1 percent compared to training on standard Imagenet. Different architectures and datasets yielded similar results.
Why it matters: As neural networks become enmeshed more deeply in a variety of fields, it becomes more helpful to know how they were trained — say, to understand bias or track use of proprietary datasets. This technique, though nascent, offers a potential path to that goal.
We’re thinking: Beyond identifying training sets, radioactive data may offer a method to enforce data privacy by making it possible to identify models trained from improperly obtained private data.