With access to a trained model, an attacker can use a reconstruction attack to approximate its training data, including examples that impinge on privacy, such as medical images. A method called InstaHide recently won acclaim for promising to make such examples unrecognizable to human eyes while retaining their utility for training. Researchers cracked it in short order.
What’s new: InstaHide aims to scramble images in a way that can’t be reversed. Nicholas Carlini and researchers at Berkeley, Columbia, Google, Princeton, Stanford, University of Virginia, and University of Wisconsin defeated InstaHide to recover images that look a lot like the originals.
Key insight: InstaHide can be viewed as a linear equation that scrambles images by summing them (typically two sensitive and four public images chosen at random) using random weights, then randomly flipping the sign of each pixel value. But summing is reversible, and changing signs doesn’t effectively obscure values. Consequently, a linear equation can be devised to reverse this process.
How it works: The authors applied InstaHide to produce targets. CIFAR-10, CIFAR-100, and STL-10 stood in for sensitive datasets. ImageNet served as their non-sensitive dataset. Then they undid the effects of the InstaHide algorithm in reverse order.
- The attack first takes the absolute value of a scrambled image to make all pixel values positive. This sets up the data for the model used in the next step.
- The authors trained a Wide ResNet-28 to determine whether any two scrambled images come from the same original.
- They constructed a graph in which every vertex represented an image, and the images at either end of an edge had at least one common parent.
- Knowing which scrambled images shared a parent image, the authors formulated a linear equation to reconstruct the parents. (In this work, common parents were highly unlikely to be non-sensitive due to ImageNet’s large number of examples. The equation accounts for such images as though they were noise.)
Results: The authors tested their approach using the CIFAR-10 and CIFAR-100 test sets as proxies for sensitive data. Subjectively, the reconstructed images closely resembled the originals. They also tried it on the InstaHide Challenge, a collection of 5,000 scrambled versions of 100 images published by the InstaHide team. They found an approximate solution in under an hour, and InstaHide’s inventors agreed that they had met the challenge.
Why it matters: Once personally identifiable information is leaked, it’s impossible to unleak. Machine learning must protect privacy with the utmost rigor.
We’re thinking: The authors show that their method can work well if the scrambled training images are available. It remains to be seen whether it works given access only to a trained model.