WooSeok Shin, Hyun Joon Park, Jin Sob Kim, Sung Won Han
In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatch framework that effectively mitigates the aforementioned limitations by maximizing the utilization of the temporal knowledge obtained during the training process. The PrevMatch framework relies on two core strategies: (1) we reconsider the use of temporal knowledge and thus directly utilize previous models obtained during training to generate additional pseudo-label guidance, referred to as previous guidance. (2) we design a highly randomized ensemble strategy to maximize the effectiveness of the previous guidance. Experimental results on four benchmark semantic segmentation datasets confirm that the proposed method consistently outperforms existing methods across various evaluation protocols. In particular, with DeepLabV3+ and ResNet-101 network settings, PrevMatch outperforms the existing state-of-the-art method, Diverse Co-training, by +1.6 mIoU on Pascal VOC with only 92 annotated images, while achieving 2.4 times faster training. Furthermore, the results indicate that PrevMatch induces stable optimization, particularly in benefiting classes that exhibit poor performance. Code is available at https://github.com/wooseok-shin/PrevMatch
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | COCO 1/256 labeled | Validation mIoU | 40.2 | PrevMatch |
| Semantic Segmentation | Pascal VOC 2012 6.25% labeled | Validation mIoU | 81.4 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 92 labeled | Validation mIoU | 77 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 92 labeled | Validation mIoU | 73.4 | PrevMatch (ResNet-50) |
| Semantic Segmentation | PASCAL VOC 2012 732 labeled | Validation mIoU | 80.4 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 732 labeled | Validation mIoU | 78.6 | PrevMatch (ResNet-50) |
| Semantic Segmentation | PASCAL VOC 2012 1464 labels | Validation mIoU | 81.6 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 1464 labels | Validation mIoU | 79.3 | PrevMatch (ResNet-50) |
| Semantic Segmentation | PASCAL VOC 2012 25% labeled | Validation mIoU | 80.8 | PrevMatch (ResNet-101) |
| Semantic Segmentation | COCO 1/128 labeled | Validation mIoU | 45.7 | PrevMatch |
| Semantic Segmentation | COCO 1/64 labeled | Validation mIoU | 48.4 | PrevMatch |
| Semantic Segmentation | Pascal VOC 2012 12.5% labeled | Validation mIoU | 81.9 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 366 labeled | Validation mIoU | 79.6 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 366 labeled | Validation mIoU | 77.5 | PrevMatch (ResNet-50) |
| Semantic Segmentation | PASCAL VOC 2012 183 labeled | Validation mIoU | 78.5 | PrevMatch (ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 183 labeled | Validation mIoU | 75.4 | PrevMatch (ResNet-50) |
| 10-shot image generation | COCO 1/256 labeled | Validation mIoU | 40.2 | PrevMatch |
| 10-shot image generation | Pascal VOC 2012 6.25% labeled | Validation mIoU | 81.4 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 92 labeled | Validation mIoU | 77 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 92 labeled | Validation mIoU | 73.4 | PrevMatch (ResNet-50) |
| 10-shot image generation | PASCAL VOC 2012 732 labeled | Validation mIoU | 80.4 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 732 labeled | Validation mIoU | 78.6 | PrevMatch (ResNet-50) |
| 10-shot image generation | PASCAL VOC 2012 1464 labels | Validation mIoU | 81.6 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 1464 labels | Validation mIoU | 79.3 | PrevMatch (ResNet-50) |
| 10-shot image generation | PASCAL VOC 2012 25% labeled | Validation mIoU | 80.8 | PrevMatch (ResNet-101) |
| 10-shot image generation | COCO 1/128 labeled | Validation mIoU | 45.7 | PrevMatch |
| 10-shot image generation | COCO 1/64 labeled | Validation mIoU | 48.4 | PrevMatch |
| 10-shot image generation | Pascal VOC 2012 12.5% labeled | Validation mIoU | 81.9 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 366 labeled | Validation mIoU | 79.6 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 366 labeled | Validation mIoU | 77.5 | PrevMatch (ResNet-50) |
| 10-shot image generation | PASCAL VOC 2012 183 labeled | Validation mIoU | 78.5 | PrevMatch (ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 183 labeled | Validation mIoU | 75.4 | PrevMatch (ResNet-50) |