Amirreza Fateh, Mohammad Reza Mohammadi, Mohammad Reza Jahed Motlagh
Few-shot Semantic Segmentation addresses the challenge of segmenting objects in query images with only a handful of annotated examples. However, many previous state-of-the-art methods either have to discard intricate local semantic features or suffer from high computational complexity. To address these challenges, we propose a new Few-shot Semantic Segmentation framework based on the transformer architecture. Our approach introduces the spatial transformer decoder and the contextual mask generation module to improve the relational understanding between support and query images. Moreover, we introduce a multi-scale decoder to refine the segmentation mask by incorporating features from different resolutions in a hierarchical manner. Additionally, our approach integrates global features from intermediate encoder stages to improve contextual understanding, while maintaining a lightweight structure to reduce complexity. This balance between performance and efficiency enables our method to achieve state-of-the-art results on benchmark datasets such as $PASCAL-5^i$ and $COCO-20^i$ in both 1-shot and 5-shot settings. Notably, our model with only 1.5 million parameters demonstrates competitive performance while overcoming limitations of existing methodologies. https://github.com/amirrezafateh/MSDNet
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 75.1 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 55.3 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 74.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 54.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 73.9 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 72.1 | MSDNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.3 | MSDNet (ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 64.7 | MSDNet (ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.1 | MSDNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 64.3 | MSDNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 71.3 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 48.5 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 70.4 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 46.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | FB-IoU | 85 | MSDNet (ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 70.8 | MSDNet (ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | FB-IoU | 82.1 | MSDNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 68.7 | MSDNet (ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 76.4 | MSDNet (ResNet-101) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 74.2 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 75.1 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 55.3 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 74.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 54.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 73.9 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 72.1 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | FB-IoU | 77.3 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 64.7 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | FB-IoU | 77.1 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 64.3 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 71.3 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 48.5 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 70.4 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 46.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | FB-IoU | 85 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 70.8 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | FB-IoU | 82.1 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 68.7 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 76.4 | MSDNet (ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 74.2 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 75.1 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 55.3 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 74.5 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 54.5 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 73.9 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 72.1 | MSDNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.3 | MSDNet (ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 64.7 | MSDNet (ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.1 | MSDNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 64.3 | MSDNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 71.3 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 48.5 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 70.4 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 46.5 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | FB-IoU | 85 | MSDNet (ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 70.8 | MSDNet (ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | FB-IoU | 82.1 | MSDNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 68.7 | MSDNet (ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | learnable parameters (million) | 1.5 | MSDNet (ResNet-50) |
| Meta-Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 76.4 | MSDNet (ResNet-101) |
| Meta-Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 74.2 | MSDNet (ResNet-50) |