Sanghyun Jo, In-Jae Yu, KyungSu Kim
Although weakly supervised semantic segmentation using only image-level labels (WSSS-IL) is potentially useful, its low performance and implementation complexity still limit its application. The main causes are (a) non-detection and (b) false-detection phenomena: (a) The class activation maps refined from existing WSSS-IL methods still only represent partial regions for large-scale objects, and (b) for small-scale objects, over-activation causes them to deviate from the object edges. We propose RecurSeed, which alternately reduces non- and false detections through recursive iterations, thereby implicitly finding an optimal junction that minimizes both errors. We also propose a novel data augmentation (DA) approach called EdgePredictMix, which further expresses an object's edge by utilizing the probability difference information between adjacent pixels in combining the segmentation results, thereby compensating for the shortcomings when applying the existing DA methods to WSSS. We achieved new state-of-the-art performances on both the PASCAL VOC 2012 and MS COCO 2014 benchmarks (VOC val: 74.4%, COCO val: 46.4%). The code is available at https://github.com/shjo-april/RecurSeed_and_EdgePredictMix.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | COCO 2014 val | mIoU | 46.4 | RS+EPM (ResNet-101, multi-stage) |
| Semantic Segmentation | COCO 2014 val | mIoU | 42.2 | RS+EPM (ResNet-50, single-stage) |
| Semantic Segmentation | PASCAL VOC 2012 val | Mean IoU | 74.4 | RS+EPM (ResNet-101, multi-stage) |
| Semantic Segmentation | PASCAL VOC 2012 val | Mean IoU | 69.5 | RS+EPM (ResNet-50, single-stage) |
| Semantic Segmentation | PASCAL VOC 2012 test | Mean IoU | 73.6 | RS+EPM (ResNet-101, multi-stage) |
| Semantic Segmentation | PASCAL VOC 2012 test | Mean IoU | 70.6 | RS+EPM (ResNet-50, single-stage) |
| 10-shot image generation | COCO 2014 val | mIoU | 46.4 | RS+EPM (ResNet-101, multi-stage) |
| 10-shot image generation | COCO 2014 val | mIoU | 42.2 | RS+EPM (ResNet-50, single-stage) |
| 10-shot image generation | PASCAL VOC 2012 val | Mean IoU | 74.4 | RS+EPM (ResNet-101, multi-stage) |
| 10-shot image generation | PASCAL VOC 2012 val | Mean IoU | 69.5 | RS+EPM (ResNet-50, single-stage) |
| 10-shot image generation | PASCAL VOC 2012 test | Mean IoU | 73.6 | RS+EPM (ResNet-101, multi-stage) |
| 10-shot image generation | PASCAL VOC 2012 test | Mean IoU | 70.6 | RS+EPM (ResNet-50, single-stage) |