TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Semi-supervised Vision Transformers at Scale

Semi-supervised Vision Transformers at Scale

Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

2022-08-11Semi-Supervised Image Classification
PaperPDFCode

Abstract

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracies. For example, Semi-ViT-Huge achieves an impressive 80% top-1 accuracy on ImageNet using only 1% labels, which is comparable with Inception-v4 using 100% ImageNet labels.

Results

TaskDatasetMetricValueModel
Image ClassificationImageNet - 1% labeled dataTop 5 Accuracy93.1Semi-ViT (ViT-Huge)
Semi-Supervised Image ClassificationImageNet - 1% labeled dataTop 5 Accuracy93.1Semi-ViT (ViT-Huge)

Related Papers

ViTSGMM: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels2025-06-04Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning2025-05-26Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization2025-05-12Weakly Semi-supervised Whole Slide Image Classification by Two-level Cross Consistency Supervision2025-04-16Diff-SySC: An Approach Using Diffusion Models for Semi-Supervised Image Classification2025-02-25SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations2024-10-03Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image Classification2024-07-04A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning2024-04-27