A typical GAN’s output doesn’t necessarily reflect the data distribution of its training set. Instead, GANs are prone to modeling the majority of the training distribution, sometimes ignoring rare attributes — say, faces that represent minority populations. A twist on the GAN architecture forces its output to better reflect the diversity of the training data.
What’s new: IMLE-GAN learns to generate all the attributes of its training dataset, including rare ones. Ning Yu spearheaded the research with colleagues at University of Maryland, Max Planck Institute, University of California Berkeley, CISPA Helmholtz Center for Information Security, Princeton’s Institute for Advanced Study, and Google.
Key insight: A GAN’s discriminator distinguishes whether or not the generator’s output is generated, while the generator learns to produce output that fools the discriminator. Ideally, a generator’s output would mirror the training data distribution, but in practice — since its only aim is to fool the discriminator, and the discriminator typically evaluates only one image at a time — it can learn to favor common types of examples. The authors had their model compare several generated works with examples from the training set, as well as interpolations between generated works, to encourage greater diversity in the output.
How it works: IMLE-GAN enhances a GAN with Implicit Maximum Likelihood Estimation (IMLE). Instead of naively adding the IMLE loss to the usual adversarial loss, the authors modified the default IMLE loss and added a novel interpolation loss to compensate for fundamental incompatibilities between the adversarial and IMLE losses.
- IMLE generates a set of images and penalizes the network based on how different those images are from real images by making nearest-neighbor comparisons. Instead of comparing pixels, like in standard IMLE, the authors compare the images over the feature space. The switch from pixels to features makes the adversarial and IMLE losses more comparable.
- To compute the interpolation loss, the authors create an additional image that is interpolated between two generated images. Then, they compare the interpolated image’s features to those of the two non-generated images that were matched to the generated images during IMLE.
- To increase inclusion of underrepresented attributes, the algorithm samples data from a set of minority examples for the IMLE and interpolation losses, but from all examples for the adversarial loss.
Results: The authors evaluated IMLE-GAN against StyleGAN and a handful of other models using Stacked MNIST, a variation of the MNIST dataset that includes handwritten digits in 1,000 distinct styles. IMLE-GAN reproduced 997 of the styles, while StyleGAN reproduced 940. Trained on CelebA, a large-scale dataset of celebrity faces, IMLE-GAN generated attributes present in less than 6 percent of training examples with increased precision compared to StyleGAN. For instance, it generated wearers of eyeglasses with 0.904 precision, compared to StyleGAN’s meager 0.719.
Why it matters: Much of the time, we want our models to learn the data distribution present in the training set. But when fairness or broad representation are at stake, we may need to put a finger on the scale. This work offers an approach to making GANs more useful in situations where diversity or fairness is critical.
We’re thinking: This work helps counter model and dataset bias. But it’s up to us to make sure that training datasets are fair and representative.