Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong liu, Feng Zheng, Wei zhang, Chengjie Wang, Long Zeng
Pseudo-label-based semi-supervised learning (SSL) has achieved great success on raw data utilization. However, its training procedure suffers from confirmation bias due to the noise contained in self-generated artificial labels. Moreover, the model's judgment becomes noisier in real-world applications with extensive out-of-distribution data. To address this issue, we propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL), which is a drop-in helper to improve the pseudo-label quality and enhance the model's robustness in the real-world setting. Rather than treating real-world data as a union set, our method separately handles reliable in-distribution data with class-wise clustering for blending into downstream tasks and noisy out-of-distribution data with image-wise contrastive for better generalization. Furthermore, by applying target re-weighting, we successfully emphasize clean label learning and simultaneously reduce noisy label learning. Despite its simplicity, our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10. On the real-world dataset Semi-iNat 2021, we improve FixMatch by 9.80% and CoMatch by 3.18%. Code is available https://github.com/TencentYoutuResearch/Classification-SemiCLS.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (40 Labels, ImageNet-100 Unlabeled) | Accuarcy | 30.89 | CCSSL |
| Image Classification | SVHN (40 Labels, ImageNet-100 Unlabeled) | Accuracy | 50.02 | CCSSL |
| Image Classification | STL-10 (1000 Labels, ImageNet-100 Unlabeled) | Accuracy | 82 | CCSSL |
| Image Classification | CIFAR-100 (10000 Labels, ImageNet-100 Unlabeled) | Accuracy | 71.12 | CCSSL |
| Image Classification | SVHN (1000 Labels, ImageNet-100 Unlabeled) | Accuracy | 88.6 | CCSSL |
| Image Classification | CIFAR-100, 2500 Labels | Percentage error | 24.3 | CCSSL(FixMatch) |
| Image Classification | CIFAR-10 (4000 Labels, ImageNet-100 Unlabeled) | Accuracy | 88.77 | CCSSL |
| Image Classification | CIFAR-10 (250 Labels, ImageNet-100 Unlabeled) | Accuracy | 67.2 | CCSSL |
| Image Classification | cifar-100, 10000 Labels | Percentage error | 19.32 | CCSSL(FixMatch) |
| Image Classification | CIFAR-100 (250 Labels, ImageNet-100 Unlabeled) | Accuarcy | 56.3 | CCSSL |
| Image Classification | CIFAR-100, 400 Labels | Percentage error | 38.81 | CCSSL(FixMatch) |
| Image Classification | CIFAR-100 (400 Labels, ImageNet-100 Unlabeled) | Accuracy | 24.53 | CCSSL |
| Image Classification | SVHN (250 Labels, ImageNet-100 Unlabeled) | Accuracy | 80.39 | CCSSL |
| Semi-Supervised Image Classification | SVHN (40 Labels, ImageNet-100 Unlabeled) | Accuracy | 50.02 | CCSSL |
| Semi-Supervised Image Classification | STL-10 (1000 Labels, ImageNet-100 Unlabeled) | Accuracy | 82 | CCSSL |
| Semi-Supervised Image Classification | CIFAR-100 (10000 Labels, ImageNet-100 Unlabeled) | Accuracy | 71.12 | CCSSL |
| Semi-Supervised Image Classification | SVHN (1000 Labels, ImageNet-100 Unlabeled) | Accuracy | 88.6 | CCSSL |
| Semi-Supervised Image Classification | CIFAR-100, 2500 Labels | Percentage error | 24.3 | CCSSL(FixMatch) |
| Semi-Supervised Image Classification | CIFAR-10 (4000 Labels, ImageNet-100 Unlabeled) | Accuracy | 88.77 | CCSSL |
| Semi-Supervised Image Classification | CIFAR-10 (250 Labels, ImageNet-100 Unlabeled) | Accuracy | 67.2 | CCSSL |
| Semi-Supervised Image Classification | cifar-100, 10000 Labels | Percentage error | 19.32 | CCSSL(FixMatch) |
| Semi-Supervised Image Classification | CIFAR-100 (250 Labels, ImageNet-100 Unlabeled) | Accuarcy | 56.3 | CCSSL |
| Semi-Supervised Image Classification | CIFAR-100, 400 Labels | Percentage error | 38.81 | CCSSL(FixMatch) |
| Semi-Supervised Image Classification | CIFAR-100 (400 Labels, ImageNet-100 Unlabeled) | Accuracy | 24.53 | CCSSL |
| Semi-Supervised Image Classification | SVHN (250 Labels, ImageNet-100 Unlabeled) | Accuracy | 80.39 | CCSSL |