Brilliant ideas strike at unlikely moments. Ian Goodfellow conceived generative adversarial networks while spitballing programming techniques with friends at a bar. Goodfellow, who views himself as “someone who works on the core technology, not the applications,” started at Stanford as a premed before switching to computer science and studying machine learning with Andrew Ng. “I realized that would be a faster path to impact more things than working on specific medical applications one at a time,” he recalls. From there, he earned a PhD in machine learning at Université de Montréal, interned at the seminal robotics lab Willow Garage, and held positions at OpenAI and Google Research. Last year, he joined Apple as director of machine learning in the Special Projects Group, which develops technologies that aren’t part of products on the market. His work at Apple is top-secret.
The Batch: How did you come up with the idea for two networks that battle each other?
Goodfellow: I’d been thinking about how to use something like a discriminator network as a way to score a speech synthesis contest, but I didn’t do it because it would have been too easy to cheat by overfitting to a particular discriminator. Some of my friends wanted to train a generator network using a technique that would have taken gigabytes of data per image, even for the tiny images we studied with generative models in 2014. We were discussing the problem one night at a bar, and they asked me how to write a program that efficiently manages gigabytes of data per image on a GPU that, back then, had about 1.5GB RAM. I said, that’s not a programming problem. It’s an algorithm design problem. Then I realized that a discriminator network could help a generator produce images if it were part of the learning process. I went home that night and started coding the first GAN.
The Batch: How long did it take?
Goodfellow: By copying and pasting bits and pieces of earlier papers, I got the first GAN to produce MNIST images in only an hour of work or so. MNIST is such a small dataset that, even back then, you could train on it very quickly. I think it trained for tens of minutes before it could produce recognizable handwritten digits.
The Batch: What did it feel like to see the first face?
Goodfellow: That wasn’t as much of a revolutionary moment as people might expect. My colleague Bing Xu modeled face images from the Toronto Face Database, which were only 90 pixels square and grayscale. Because the faces were always centered and looking straight at the camera, even very simple algorithms like PCA could make pretty good faces. The main thing we were surprised by were the images it made of CIFAR10, where there’s a lot of variability. They looked like crap. But we had been looking at crap from generative models for years, and we could tell that this was an exciting, new kind of crap.
The Batch: Has anything surprised you about the way this work has played out?
Goodfellow: In the first GAN paper, we included a list of things that might happen in future work. A lot of them did. The one big category of things I didn’t anticipate was domain-to-domain translation. GANs like CycleGAN from Berkeley. You can take a picture of a horse and have it transformed into a zebra without training on matched pairs of horse and zebra images. That’s very powerful because it can be easy to passively collect data in each domain, but it’s time-consuming and expensive to get data that matches up across domains.
The Batch: What are you most hopeful about in GAN research?
Goodfellow: I’d like to see more use of GANs in the physical world, specifically for medicine. I’d like to see the community move toward more traditional science applications, where you have to get your hands dirty in the lab. That can lead to more things that have more of a tangible, positive impact on peoples’ lives. For instance, in dentistry, GANs have been used to make personalized crowns for individual patients. Insilico is using GANs to design medicinal drugs.
The Batch: How much do you worry about bias in GAN output? The ability to produce realistic human faces makes it a pressing issue.
Goodfellow: GANs can be used to counteract biases in training data by generating training data for other machine learning algorithms. If there’s a language where you don’t have as much representation in your data, you can oversample that. At Apple, we were able to generate data for a gestural text-entry feature called QuickPath. A startup called Vue.ai uses GANs to generate images of what clothing would look like on different models. Traditionally, there may not have been much diversity in who was hired to be a model to try on this clothing. Now you can get a model who looks like you, wearing a specific item of clothing you’re interested in. These are baby steps, but I hope there are other ways GANs can be used to address issues of underrepresentation in datasets.