Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chenyao Wang, Shu Liu, Jingyong Su, Jiaya Jia
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations. Previous methods limited to the semantic feature and prototype representation suffer from coarse segmentation granularity and train-set overfitting. In this work, we design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture. The self-attention modules are used to assist in establishing hierarchical dense features, as a means to accomplish the cascade matching between query and support features. Moreover, we propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation. Our method performs decently in experiments. We achieve $50.0\%$ mIoU on \coco~dataset one-shot setting and $56.0\%$ on five-shot segmentation, respectively.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 77.7 | HDMNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 56 | HDMNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 52.4 | HDMNet (VGG-16) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 69.4 | HDMNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 65.1 | HDMNet (VGG-16) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 72.2 | HDMNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 50 | HDMNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 45.9 | HDMNet (VGG-16) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 71.8 | HDMNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 69.3 | HDMNet (VGG-16) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 77.7 | HDMNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 56 | HDMNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 52.4 | HDMNet (VGG-16) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 69.4 | HDMNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 65.1 | HDMNet (VGG-16) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 72.2 | HDMNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 50 | HDMNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 45.9 | HDMNet (VGG-16) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 71.8 | HDMNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 69.3 | HDMNet (VGG-16) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 77.7 | HDMNet (ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 56 | HDMNet (ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 52.4 | HDMNet (VGG-16) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 69.4 | HDMNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 65.1 | HDMNet (VGG-16) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 72.2 | HDMNet (ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 50 | HDMNet (ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 45.9 | HDMNet (VGG-16) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 71.8 | HDMNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 69.3 | HDMNet (VGG-16) |