Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia
We study few-shot semantic segmentation that aims to segment a target object from a query image when provided with a few annotated support images of the target class. Several recent methods resort to a feature masking (FM) technique to discard irrelevant feature activations which eventually facilitates the reliable prediction of segmentation mask. A fundamental limitation of FM is the inability to preserve the fine-grained spatial details that affect the accuracy of segmentation mask, especially for small target objects. In this paper, we develop a simple, effective, and efficient approach to enhance feature masking (FM). We dub the enhanced FM as hybrid masking (HM). Specifically, we compensate for the loss of fine-grained spatial details in FM technique by investigating and leveraging a complementary basic input masking method. Experiments have been conducted on three publicly available benchmarks with strong few-shot segmentation (FSS) baselines. We empirically show improved performance against the current state-of-the-art methods by visible margins across different benchmarks. Our code and trained models are available at: https://github.com/moonsh/HM-Hybrid-Masking
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Few-Shot Learning | FSS-1000 (5-shot) | Mean IoU | 90.5 | VAT (HM, ResNet-101) |
| Few-Shot Learning | FSS-1000 (5-shot) | Mean IoU | 89.9 | VAT (HM, ResNet-50) |
| Few-Shot Learning | FSS-1000 (5-shot) | Mean IoU | 88.5 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | FSS-1000 (5-shot) | Mean IoU | 88 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 73.3 | ASNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 50.6 | ASNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 72.9 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 50.6 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 72.2 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 49.4 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 72.2 | ASNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 48.4 | ASNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | FB-IoU | 71.8 | VAT (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (5-shot) | Mean IoU | 48.3 | VAT (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 66.5 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 65.2 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 65.1 | VAT (HM, ResNet-50) |
| Few-Shot Learning | FSS-1000 (1-shot) | Mean IoU | 90.2 | VAT (HM, ResNet-101) |
| Few-Shot Learning | FSS-1000 (1-shot) | Mean IoU | 89.4 | VAT (HM, ResNet-50) |
| Few-Shot Learning | FSS-1000 (1-shot) | Mean IoU | 87.8 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | FSS-1000 (1-shot) | Mean IoU | 87.1 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | FB-IoU | 79.4 | VAT (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 67.8 | VAT (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.8 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 66.7 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.1 | VAT (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 65.8 | VAT (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | FB-IoU | 76.5 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (1-Shot) | Mean IoU | 65 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 71.5 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 46.5 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 71.1 | ASNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 45.9 | ASNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 70.4 | ASNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 44.7 | ASNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 70.8 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 44.3 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | FB-IoU | 70 | VAT (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i (1-shot) | Mean IoU | 43.2 | VAT (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | FB-IoU | 81.5 | VAT (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 70.9 | VAT (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | FB-IoU | 79.7 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 69.3 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | FB-IoU | 78.5 | VAT (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 68.2 | VAT (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | FB-IoU | 77.7 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | PASCAL-5i (5-Shot) | Mean IoU | 67.1 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 70.9 | HSNet (HM, ResNet-101) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 69.7 | HSNet (HM, ResNet-50) |
| Few-Shot Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 69.7 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | FSS-1000 (5-shot) | Mean IoU | 90.5 | VAT (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | FSS-1000 (5-shot) | Mean IoU | 89.9 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | FSS-1000 (5-shot) | Mean IoU | 88.5 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | FSS-1000 (5-shot) | Mean IoU | 88 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 73.3 | ASNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 50.6 | ASNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 72.9 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 50.6 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 72.2 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 49.4 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 72.2 | ASNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 48.4 | ASNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | FB-IoU | 71.8 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (5-shot) | Mean IoU | 48.3 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 66.5 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 65.2 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 65.1 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | FSS-1000 (1-shot) | Mean IoU | 90.2 | VAT (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | FSS-1000 (1-shot) | Mean IoU | 89.4 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | FSS-1000 (1-shot) | Mean IoU | 87.8 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | FSS-1000 (1-shot) | Mean IoU | 87.1 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | FB-IoU | 79.4 | VAT (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 67.8 | VAT (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | FB-IoU | 77.8 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 66.7 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | FB-IoU | 77.1 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 65.8 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | FB-IoU | 76.5 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (1-Shot) | Mean IoU | 65 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 71.5 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 46.5 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 71.1 | ASNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 45.9 | ASNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 70.4 | ASNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 44.7 | ASNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 70.8 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 44.3 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | FB-IoU | 70 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i (1-shot) | Mean IoU | 43.2 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | FB-IoU | 81.5 | VAT (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 70.9 | VAT (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | FB-IoU | 79.7 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 69.3 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | FB-IoU | 78.5 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 68.2 | VAT (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | FB-IoU | 77.7 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | PASCAL-5i (5-Shot) | Mean IoU | 67.1 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 70.9 | HSNet (HM, ResNet-101) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 69.7 | HSNet (HM, ResNet-50) |
| Few-Shot Semantic Segmentation | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 69.7 | VAT (HM, ResNet-50) |
| Meta-Learning | FSS-1000 (5-shot) | Mean IoU | 90.5 | VAT (HM, ResNet-101) |
| Meta-Learning | FSS-1000 (5-shot) | Mean IoU | 89.9 | VAT (HM, ResNet-50) |
| Meta-Learning | FSS-1000 (5-shot) | Mean IoU | 88.5 | HSNet (HM, ResNet-101) |
| Meta-Learning | FSS-1000 (5-shot) | Mean IoU | 88 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 73.3 | ASNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 50.6 | ASNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 72.9 | HSNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 50.6 | HSNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 72.2 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 49.4 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 72.2 | ASNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 48.4 | ASNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | FB-IoU | 71.8 | VAT (HM, ResNet-50) |
| Meta-Learning | COCO-20i (5-shot) | Mean IoU | 48.3 | VAT (HM, ResNet-50) |
| Meta-Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 66.5 | HSNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 65.2 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i -> Pascal VOC (1-shot) | Mean IoU | 65.1 | VAT (HM, ResNet-50) |
| Meta-Learning | FSS-1000 (1-shot) | Mean IoU | 90.2 | VAT (HM, ResNet-101) |
| Meta-Learning | FSS-1000 (1-shot) | Mean IoU | 89.4 | VAT (HM, ResNet-50) |
| Meta-Learning | FSS-1000 (1-shot) | Mean IoU | 87.8 | HSNet (HM, ResNet-101) |
| Meta-Learning | FSS-1000 (1-shot) | Mean IoU | 87.1 | HSNet (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | FB-IoU | 79.4 | VAT (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 67.8 | VAT (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.8 | HSNet (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 66.7 | HSNet (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (1-Shot) | FB-IoU | 77.1 | VAT (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 65.8 | VAT (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | FB-IoU | 76.5 | HSNet (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (1-Shot) | Mean IoU | 65 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 71.5 | HSNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 46.5 | HSNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 71.1 | ASNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 45.9 | ASNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 70.4 | ASNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 44.7 | ASNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 70.8 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 44.3 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | FB-IoU | 70 | VAT (HM, ResNet-50) |
| Meta-Learning | COCO-20i (1-shot) | Mean IoU | 43.2 | VAT (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | FB-IoU | 81.5 | VAT (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 70.9 | VAT (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | FB-IoU | 79.7 | HSNet (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 69.3 | HSNet (HM, ResNet-101) |
| Meta-Learning | PASCAL-5i (5-Shot) | FB-IoU | 78.5 | VAT (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 68.2 | VAT (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | FB-IoU | 77.7 | HSNet (HM, ResNet-50) |
| Meta-Learning | PASCAL-5i (5-Shot) | Mean IoU | 67.1 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 70.9 | HSNet (HM, ResNet-101) |
| Meta-Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 69.7 | HSNet (HM, ResNet-50) |
| Meta-Learning | COCO-20i -> Pascal VOC (5-shot) | Mean IoU | 69.7 | VAT (HM, ResNet-50) |