Swin
Less Data for Vision Transformers: Boosting Vision Transformer Performance with Less Data
Vision Transformer (ViT) outperformed convolutional neural networks in image classification, but it required more training data. New work enabled ViT and its variants to outperform other architectures with less training data.