Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, Yinghuan Shi
In this work, we revisit the weak-to-strong consistency framework, popularized by FixMatch from semi-supervised classification, where the prediction of a weakly perturbed image serves as supervision for its strongly perturbed version. Intriguingly, we observe that such a simple pipeline already achieves competitive results against recent advanced works, when transferred to our segmentation scenario. Its success heavily relies on the manual design of strong data augmentations, however, which may be limited and inadequate to explore a broader perturbation space. Motivated by this, we propose an auxiliary feature perturbation stream as a supplement, leading to an expanded perturbation space. On the other, to sufficiently probe original image-level augmentations, we present a dual-stream perturbation technique, enabling two strong views to be simultaneously guided by a common weak view. Consequently, our overall Unified Dual-Stream Perturbations approach (UniMatch) surpasses all existing methods significantly across all evaluation protocols on the Pascal, Cityscapes, and COCO benchmarks. Its superiority is also demonstrated in remote sensing interpretation and medical image analysis. We hope our reproduced FixMatch and our results can inspire more future works. Code and logs are available at https://github.com/LiheYoung/UniMatch.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Medical Image Segmentation | ACDC 5% labeled data | Dice (Average) | 87.61 | UniMatch |
| Medical Image Segmentation | ACDC 10% labeled data | Dice (Average) | 89.92 | UniMatch |
| Medical Image Segmentation | ACDC 20% labeled data | Dice (Average) | 90.47 | UniMatch |
| Semantic Segmentation | COCO 1/512 labeled | Validation mIoU | 31.9 | UniMatch |
| Semantic Segmentation | COCO 1/256 labeled | Validation mIoU | 38.9 | UniMatch |
| Semantic Segmentation | ADE20K 1/16 labeled | Validation mIoU | 31.5 | UniMatch |
| Semantic Segmentation | Pascal VOC 2012 6.25% labeled | Validation mIoU | 80.94 | UniMatch (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 92 labeled | Validation mIoU | 75.2 | UniMatch (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | ADE20K 1/32 labeled | Validation mIoU | 28.1 | UniMatch |
| Semantic Segmentation | PASCAL VOC 2012 732 labeled | Validation mIoU | 79.9 | UniMatch (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 1464 labels | Validation mIoU | 81.2 | UniMatch (DeepLab v3 with ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 25% labeled | Validation mIoU | 80.43 | UniMatch (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | COCO 1/128 labeled | Validation mIoU | 44.5 | UniMatch |
| Semantic Segmentation | COCO 1/64 labeled | Validation mIoU | 48.2 | UniMatch |
| Semantic Segmentation | Cityscapes 100 samples labeled | Validation mIoU | 73 | UniMatch (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 366 labeled | Validation mIoU | 78.8 | UniMatch (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | Cityscapes 6.25% labeled | Validation mIoU | 76.59 | UniMatch (DeepLab v3+ with ResNet-101 pretraind on ImageNet-1K) |
| Semantic Segmentation | COCO 1/32 labeled | Validation mIoU | 49.8 | UniMatch |
| Semantic Segmentation | PASCAL VOC 2012 183 labeled | Validation mIoU | 77.2 | UniMatch (DeepLab v3+ with ResNet-101) |
| Change Detection | WHU - 20% labeled data | IoU | 81.7 | UniMatch |
| Change Detection | WHU - 5% labeled data | IoU | 80.2 | UniMatch |
| Change Detection | WHU - 10% labeled data | IoU | 81.7 | UniMatch |
| Change Detection | WHU - 40% labeled data | IoU | 85.1 | UniMatch |
| Change Detection | LEVIR-CD - 10% labeled data | IoU | 82 | UniMatch |
| Change Detection | LEVIR-CD - 5% labeled data | IoU | 80.7 | UniMatch |
| Change Detection | LEVIR-CD - 20% labeled data | IoU | 81.7 | UniMatch |
| Change Detection | LEVIR-CD - 40% labeled data | IoU | 82.1 | UniMatch |
| 10-shot image generation | COCO 1/512 labeled | Validation mIoU | 31.9 | UniMatch |
| 10-shot image generation | COCO 1/256 labeled | Validation mIoU | 38.9 | UniMatch |
| 10-shot image generation | ADE20K 1/16 labeled | Validation mIoU | 31.5 | UniMatch |
| 10-shot image generation | Pascal VOC 2012 6.25% labeled | Validation mIoU | 80.94 | UniMatch (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 92 labeled | Validation mIoU | 75.2 | UniMatch (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | ADE20K 1/32 labeled | Validation mIoU | 28.1 | UniMatch |
| 10-shot image generation | PASCAL VOC 2012 732 labeled | Validation mIoU | 79.9 | UniMatch (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 1464 labels | Validation mIoU | 81.2 | UniMatch (DeepLab v3 with ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 25% labeled | Validation mIoU | 80.43 | UniMatch (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | COCO 1/128 labeled | Validation mIoU | 44.5 | UniMatch |
| 10-shot image generation | COCO 1/64 labeled | Validation mIoU | 48.2 | UniMatch |
| 10-shot image generation | Cityscapes 100 samples labeled | Validation mIoU | 73 | UniMatch (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 366 labeled | Validation mIoU | 78.8 | UniMatch (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | Cityscapes 6.25% labeled | Validation mIoU | 76.59 | UniMatch (DeepLab v3+ with ResNet-101 pretraind on ImageNet-1K) |
| 10-shot image generation | COCO 1/32 labeled | Validation mIoU | 49.8 | UniMatch |
| 10-shot image generation | PASCAL VOC 2012 183 labeled | Validation mIoU | 77.2 | UniMatch (DeepLab v3+ with ResNet-101) |