Generative networks can embroider sentences into stories and melodies into full-fledged arrangements. A new model does something similar with drawings.
What’s new: Researchers at the University of Oxford, Adobe Research, and UC Berkeley introduce a model that interactively fills in virtual pencil sketches. SkinnyResNet turns crude lines drawn in a web browser into photorealistic pictures complete with colors and textures.
Key insight: Most sketch-to-image networks require users to create a complete sketch before transforming it into a finished picture. To bridge the gap between initial strokes and completed outlines, the model starts conjuring detailed images from the first pencil mark.
How it works: The system is based on two generative adversarial networks. A sketch-completion GAN predicts what the user aims to draw, and an image-generation GAN acts on the prediction to generate an image.
- The authors constructed an outline-to-image dataset comprising 200 pairs in 10 classes. They obtained the images by searching Google and extracted the outlines digitally.
- The sketch-completion GAN generates a complete outline from the current state of a user’s sketch. It was trained on partial outlines created by deleting random patches from full outlines.
- The user chooses a class of object to sketch. The image-generation GAN takes the predicted sketch and object class, and generates a photorealistic image.
- Another neural network controls the image-generation GAN to create the type of object selected. The GAN is composed of CNN layers, and the control network can toggle particular channels on or off depending on the object class. In this way, different channels specialize in generating different image classes.
Results: Arnab Ghosh and colleagues compared their model’s output with that of an encoder-decoder network inspired by MUNIT. They fine-tuned a pretrained Inception v3 network on their dataset and used it to classify images generated by both models. The classifier correctly identified 97 percent of SkinnyResNet images compared with 92.7 percent of the encoder-decoder’s output. A group of human labelers classified 23 percent of SkinnyResNet’s output as real images, while labeling only 14.1 percent of the encoder-decoder’s output as real.
Why it matters: We’ve come a long way since Photoshop 1.0, and this research may offer a glimpse of the design tools to come. Rather than passive programs modeled after real-world items like pencils and paintbrushes, such tools might evolve into proactive assistants that help designers visualize and finish their creations.
We’re thinking: Why stop at drawing? Tools for writing and music composition are already headed in this direction. Other creative pursuits like 3D modeling, mechanical design, architecture, and choreography could take advantage of similar generative techniques.