The previous state-of-the-art image classifier was trained on the ImageNet dataset plus 3.5 billion supplemental images from a different database. A new method achieved higher accuracy with one-tenth as many supplemental examples — and they were unlabeled, to boot.
What’s new: Qizhe Xie and a team at Google Brain plus Carnegie Mellon’s Eduard Hovy introduced a method they call Noisy Student, in which a model learns from another model in a teacher-student relationship. Noisy Student achieved better performance on ImageNet.
Key insight: In the learning approach known as self-training, a model that’s designated the teacher trains on labeled data and then generates pseudo-labels on unlabeled data. Then a student model trains on both the labeled data and pseudo-labeled data. Noisy Student adds two tweaks: The student network is larger than that of the teacher, and the student’s training data is adulterated with noise.
How it works: Both teacher and student use an EfficientNet architecture. The higher-capacity architecture is good for the student, which has more parameters and processes more data than the teacher.
- The teacher is trained on ImageNet’s training set. It then predicts pseudo labels for 300 million unlabeled images from Google’s private JFT dataset.
- The student training dataset consists of ImageNet’s training set plus the 130 thousand JFT images with the highest confidence predictions for each pseudo label class.
- During the student’s training, the algorithm applies data augmentation and also uses dropout and the stochastic depth method to perturb the model weights. These steps nudge the student to generalize beyond its teacher’s ability.
- The teacher-student training cycle can be repeated, treating each previous student as a new teacher.
Results: Noisy Student improved state-of-the-art accuracy on ImageNet as a whole and on specialized subsets. On ImageNet, it increased top-5 accuracy, meaning the true label was in the top five predictions, by 0.2 percent to 98.2 percent. Noisy Student also boosted the top-1 accuracy by 1 percent to 87.4 percent. Furthermore, it matched or exceeded previously established records for ImageNet A,C, and P, which are subsets that have been corrupted or perturbed or are commonly misclassified.
Why it matters: These results are another step forward for using unlabeled data to boost image classification accuracy.
We’re thinking: Unlabeled examples are far more plentiful than labeled datasets. Techniques like this may be key to enabling learning algorithms to exploit far more data than was possible before.