Xinqiao Zhao, Feilong Tang, Xiaoyang Wang, Jimin Xiao
Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost. Existing methods mainly rely on Class Activation Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models. In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through classifier weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes. This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. Specifically, we leverage the class prototypes that carry positive shared features and propose a Multi-Scaled Distribution-Weighted (MSDW) consistency loss for narrowing the gap between the CAMs generated through classifier weights and class prototypes during training. The MSDW loss counterbalances over-activation and under-activation by calibrating the shared features in head-/tail-class classifier weights. Experimental results show that our SFC significantly improves CAM boundaries and achieves new state-of-the-art performances. The project is available at https://github.com/Barrett-python/SFC.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | PASCAL VOC 2012 val | Mean IoU | 71.2 | SFC(ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 test | Mean IoU | 72.5 | SFC(ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 val | Mean IoU | 71.2 | SFC(ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 test | Mean IoU | 72.5 | SFC(ResNet-101) |