A simple linear classifier paired with a self-supervised feature extractor outperformed a supervised deep learning model on ImageNet, according to new research.
What’s new: Ting Chen and colleagues at Google Brain devised a self-supervised training algorithm (a task that trains a model on unlabeled data to generate features helpful in performing other tasks). Simple Contrastive Learning (SimCLR) compares original and modified versions of images, so a model learns to extract feature representations that are consistent between the two.
Key insight: Images and variations produced by data-augmentation techniques such as rotation have similar features — so similar that they’re more informative than the labels such images also might share. SimCLR trains a model to extract features that are unchanged by such transformations, a technique known as contrastive learning.
How it works: Unlike other contrastive learning techniques, SimCLR can be used with any model architecture. It requires only multiple data-augmentation methods (which the researchers specify only for images, the subject of this study).
- During training, the researchers modify ImageNet examples, producing pairs that consist of an original and a variant altered by combinations of cropping, flipping, rotation, color distortion, blur, and noise.
- SimCLR trains a model to extract from each version feature vectors with minimal differences in the angles between them.
- The trained model can extract features from a labeled dataset for a downstream classifier that, in turn, learns to map the features to the labels. Alternatively, the model can be fine-tuned using labeled data, in effect using SimCLR as an unsupervised pre-training algorithm.
Results: A ResNet-50(x4) trained with SimCLR extracted features from ImageNet using all labels. A linear classifier trained on the resulting features achieved 76.5 percent top-1 accuracy, 0.1 percent better than a fully supervised ResNet-50. SimCLR achieved similar results on a variety of other image datasets.
Why it matters: Self-supervised learning schemes often rely on complicated tasks to extract features from unlabeled data. SimCLR simply extracts similar features from similar examples.
We’re thinking: This method seems like it would work well on audio data. We’re curious to see how effective it can be with text and other data types based on alphanumeric characters.