Sanghyun Jo, Fei Pan, In-Jae Yu, KyungSu Kim
Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8\%, COCO: 53.9\%, Context: 49.0\%, ADE: 32.9\%, Stuff: 37.4\%), reducing the gap with fully supervised methods by over 84\% on the VOC validation set. Code is available at https://github.com/shjo-april/DHR.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | COCO 2014 val | mIoU | 56.8 | DHR (Swin-L, Mask2Former) |
| Semantic Segmentation | COCO-Stuff val | mIoU | 37.4 | DHR (Swin-L, Mask2Former) |
| Semantic Segmentation | PASCAL VOC 2012 val | Mean IoU | 82.3 | DHR (Swin-L, Mask2Former) |
| Semantic Segmentation | PASCAL Context val | mIoU | 53.6 | DHR (Swin-L, Mask2Former) |
| Semantic Segmentation | PASCAL VOC 2012 test | Mean IoU | 82.3 | DHR (Swin-L, Mask2Former) |
| Semantic Segmentation | ADE20K val | mIoU | 32.9 | DHR (Swin-L, Mask2Former) |
| 10-shot image generation | COCO 2014 val | mIoU | 56.8 | DHR (Swin-L, Mask2Former) |
| 10-shot image generation | COCO-Stuff val | mIoU | 37.4 | DHR (Swin-L, Mask2Former) |
| 10-shot image generation | PASCAL VOC 2012 val | Mean IoU | 82.3 | DHR (Swin-L, Mask2Former) |
| 10-shot image generation | PASCAL Context val | mIoU | 53.6 | DHR (Swin-L, Mask2Former) |
| 10-shot image generation | PASCAL VOC 2012 test | Mean IoU | 82.3 | DHR (Swin-L, Mask2Former) |
| 10-shot image generation | ADE20K val | mIoU | 32.9 | DHR (Swin-L, Mask2Former) |