Generative adversarial networks clearly learn to extract meaningful information about images. After all, they dream up pictures that, to human eyes, can be indistinguishable from photos. Researchers at DeepMind tapped that power, building a GAN that generates feature vectors from images.
What’s new: Jeff Donahue and Karen Simonyan adapted the state-of-the-art BigGAN image synthesizer for representation learning. They modified its discriminator, which learns to differentiate between artificially generated images and training images, and added an encoder network from the unrelated model BiGAN. The new network, dubbed BigBiGAN, not only generates superb images but also learns feature vectors that help existing image recognition networks do a better job.
Key insight: An encoder coupled with a powerful generator makes an effective representation learner.
How it works: BigBiGAN consists of three main components: generator, encoder, and discriminator.
- The generator learns to create an image from noise sampled from a learned distribution. Its goal is to generate an image realistic enough to fool the discriminator.
- Working in parallel with the generator, the encoder learns a mapping from an input image to its feature vector through a neural network such as ResNet-50.
- The discriminator takes in an image and a feature vector from both the generator and encoder. It outputs a single probability score indicating whether or not the discriminator reckons the image is real.
- The discriminator provides other outputs as well: a score dependent on the images and another dependent on their feature vectors. This helps ensure that the network learns the distributions of the images and feature vectors individually, in addition to their joint distribution.
Results: Used by an image classifier via transfer learning, features from BigBiGAN match the best unsupervised representation learning approach in ImageNet classification. Used as an image generator, it sets a new state of the art for inception score (similarity between original images and their generated counterparts) and Fréchet inception distance (difference between the feature maps of original and generated images).
Why it matters: Representation learning with GANs can take advantage of the world’s massive amounts of unlabeled data. BigBiGAN demonstrates that representations learned by GANs are transferable to tasks beyond image generation.
Takeaway: BigBiGAN takes us one step closer to bridging the gap between what models understand and how they can express that understanding to us.