Self-Supervised Simplicity Image classification with simple contrastive learning (SimCLR)

Published

Mar 18, 2020

Reading time

2 min read

A simple linear classifier paired with a self-supervised feature extractor outperformed a supervised deep learning model on ImageNet, according to new research.

What’s new: Ting Chen and colleagues at Google Brain devised a self-supervised training algorithm (a task that trains a model on unlabeled data to generate features helpful in performing other tasks). Simple Contrastive Learning (SimCLR) compares original and modified versions of images, so a model learns to extract feature representations that are consistent between the two.

Key insight: Images and variations produced by data-augmentation techniques such as rotation have similar features — so similar that they’re more informative than the labels such images also might share. SimCLR trains a model to extract features that are unchanged by such transformations, a technique known as contrastive learning.

How it works: Unlike other contrastive learning techniques, SimCLR can be used with any model architecture. It requires only multiple data-augmentation methods (which the researchers specify only for images, the subject of this study).

During training, the researchers modify ImageNet examples, producing pairs that consist of an original and a variant altered by combinations of cropping, flipping, rotation, color distortion, blur, and noise.
SimCLR trains a model to extract from each version feature vectors with minimal differences in the angles between them.
The trained model can extract features from a labeled dataset for a downstream classifier that, in turn, learns to map the features to the labels. Alternatively, the model can be fine-tuned using labeled data, in effect using SimCLR as an unsupervised pre-training algorithm.

Results: A ResNet-50(x4) trained with SimCLR extracted features from ImageNet using all labels. A linear classifier trained on the resulting features achieved 76.5 percent top-1 accuracy, 0.1 percent better than a fully supervised ResNet-50. SimCLR achieved similar results on a variety of other image datasets.
Why it matters: Self-supervised learning schemes often rely on complicated tasks to extract features from unlabeled data. SimCLR simply extracts similar features from similar examples.

We’re thinking: This method seems like it would work well on audio data. We’re curious to see how effective it can be with text and other data types based on alphanumeric characters.

Subscribe to The Batch