An automated art critic spells out the emotional impact of images.
What’s new: Led by Panos Achlioptas, researchers at Ecole Polytechnique, King Abdullah University, and Stanford University trained a deep learning system to generate subjective interpretations of art.
How it works: The robot reviewer is a showcase for the authors’ dataset ArtEmis, which combines images with subjective commentary. ArtEmis comprises around 81,500 paintings, photos, and other images from the online encyclopedia WikiArt along with crowdsourced labels that describe the emotional character of each work (“amusement,” “awe,” “sadness,” and so on) and brief explanations of how the work inspired those emotions (to explain amusement, for instance, “His mustache looks like a bird soaring through the clouds.”)
- The researchers trained a model based on Show-Attend-Tell, which combines a convolutional neural network and an LSTM outfitted with attention, to replicate the annotations.
- As a baseline, they used a ResNet-32 pretrained on ImageNet. Given a test image, they used a nearest neighbor search to find the most similar image in the ArtEmis training set. Then they chose one of that image’s captions at random.
Results: Volunteers guessed whether a given caption came from Show-Attend-Tell or a human, and roughly half the model’s captions passed as human-written. Nonetheless, the authors found the model’s output on average less accurate, imaginative, and diverse than the human annotations. The team also compared generated and baseline captions using a number of natural language metrics. Show-Attend-Tell achieved a ROUGE-L score of 0.295 versus the baseline 0.208 (a perfect score being 1.0). It achieved a METEOR score of 0.139 versus the baseline 0.1 (out of a perfect 1.0).
Behind the news: Other AI systems have probed the elusive relationship between images and emotions, especially images of human faces. For instance, a GAN has been built to generate synthetic faces that express one of eight emotions, and some software vendors dubiously claimed to evaluate job candidates based on facial expressions.
Why it matters: When humans look at an image, they perceive meanings beyond the subject matter displayed in the frame. Systems that help people make sense of the world could benefit from the ability to make such subjective judgments, whether they’re evaluating artworks, product recommendations, medical images, or flaws in manufactured goods.
We’re thinking: Show-Attend-Tell’s soft deterministic attention mechanism makes us feel like we’re looking at a dream.