TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dense Cross-Query-and-Support Attention Weighted Mask Aggr...

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Xinyu Shi, Dong Wei, Yu Zhang, Donghuan Lu, Munan Ning, Jiashun Chen, Kai Ma, Yefeng Zheng

2022-07-18SegmentationFew-Shot Semantic SegmentationSemantic Segmentation
PaperPDFCode(official)

Abstract

Research into Few-shot Semantic Segmentation (FSS) has attracted great attention, with the goal to segment target objects in a query image given only a few annotated support images of the target class. A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images. However, most existing approaches either compressed the support information into a few class-wise prototypes, or used partial support information (e.g., only foreground) at the pixel level, causing non-negligible information loss. In this paper, we propose Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation (DCAMA), where both foreground and background support information are fully exploited via multi-level pixel-wise correlations between paired query and support features. Implemented with the scaled dot-product attention in the Transformer architecture, DCAMA treats every query pixel as a token, computes its similarities with all support pixels, and predicts its segmentation label as an additive aggregation of all the support pixels' labels -- weighted by the similarities. Based on the unique formulation of DCAMA, we further propose efficient and effective one-pass inference for n-shot segmentation, where pixels of all support images are collected for the mask aggregation at once. Experiments show that our DCAMA significantly advances the state of the art on standard FSS benchmarks of PASCAL-5i, COCO-20i, and FSS-1000, e.g., with 3.1%, 9.7%, and 3.6% absolute improvements in 1-shot mIoU over previous best records. Ablative studies also verify the design DCAMA.

Results

TaskDatasetMetricValueModel
Few-Shot LearningFSS-1000 (5-shot)FB-IoU94.1DCAMA (Swin-B)
Few-Shot LearningFSS-1000 (5-shot)Mean IoU90.4DCAMA (Swin-B)
Few-Shot LearningFSS-1000 (5-shot)FB-IoU93.1DCAMA (ResNet-101)
Few-Shot LearningFSS-1000 (5-shot)Mean IoU89.1DCAMA (ResNet-101)
Few-Shot LearningFSS-1000 (5-shot)FB-IoU92.9DCAMA (ResNet-50)
Few-Shot LearningFSS-1000 (5-shot)Mean IoU88.8DCAMA (ResNet-50)
Few-Shot LearningCOCO-20i (5-shot)FB-IoU76.9DCAMA (Swin-B)
Few-Shot LearningCOCO-20i (5-shot)Mean IoU58.3DCAMA (Swin-B)
Few-Shot LearningCOCO-20i (5-shot)FB-IoU73.3DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (5-shot)Mean IoU51.9DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (5-shot)learnable parameters (million)47.7DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (5-shot)FB-IoU71.7DCAMA (ResNet-50)
Few-Shot LearningCOCO-20i (5-shot)Mean IoU48.3DCAMA (ResNet-50)
Few-Shot LearningCOCO-20i (5-shot)learnable parameters (million)47.7DCAMA (ResNet-50)
Few-Shot LearningCOCO-20i (2-way 1-shot)mIoU31.7DCAMA (Swin-B)
Few-Shot LearningFSS-1000 (1-shot)FB-IoU93.8DCAMA (Swin-B)
Few-Shot LearningFSS-1000 (1-shot)Mean IoU90.1DCAMA (Swin-B)
Few-Shot LearningFSS-1000 (1-shot)FB-IoU92.4DCAMA (ResNet-101)
Few-Shot LearningFSS-1000 (1-shot)Mean IoU88.3DCAMA (ResNet-101)
Few-Shot LearningFSS-1000 (1-shot)FB-IoU92.5DCAMA (ResNet-50)
Few-Shot LearningFSS-1000 (1-shot)Mean IoU88.2DCAMA (ResNet-50)
Few-Shot LearningPASCAL-5i (1-Shot)FB-IoU78.5DCAMA (Swin-B)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU69.3DCAMA (Swin-B)
Few-Shot LearningPASCAL-5i (1-Shot)FB-IoU75.7DCAMA (ResNet-50)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU64.6DCAMA (ResNet-50)
Few-Shot LearningPASCAL-5i (1-Shot)FB-IoU77.6DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (1-shot)FB-IoU73.2DCAMA (Swin-B)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU50.9DCAMA (Swin-B)
Few-Shot LearningCOCO-20i (1-shot)FB-IoU69.9DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU43.5DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (1-shot)learnable parameters (million)47.7DCAMA (ResNet-101)
Few-Shot LearningCOCO-20i (1-shot)FB-IoU69.5DCAMA (ResNet-50)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU43.3DCAMA (ResNet-50)
Few-Shot LearningPASCAL-5i (5-Shot)FB-IoU82.9DCAMA (Swin-B)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU74.9DCAMA (Swin-B)
Few-Shot LearningPASCAL-5i (5-Shot)FB-IoU79.5DCAMA (ResNet-50)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU68.5DCAMA (ResNet-50)
Few-Shot LearningPASCAL-5i (5-Shot)FB-IoU80.8DCAMA (ResNet-101)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU68.3DCAMA (ResNet-101)
Few-Shot Semantic SegmentationFSS-1000 (5-shot)FB-IoU94.1DCAMA (Swin-B)
Few-Shot Semantic SegmentationFSS-1000 (5-shot)Mean IoU90.4DCAMA (Swin-B)
Few-Shot Semantic SegmentationFSS-1000 (5-shot)FB-IoU93.1DCAMA (ResNet-101)
Few-Shot Semantic SegmentationFSS-1000 (5-shot)Mean IoU89.1DCAMA (ResNet-101)
Few-Shot Semantic SegmentationFSS-1000 (5-shot)FB-IoU92.9DCAMA (ResNet-50)
Few-Shot Semantic SegmentationFSS-1000 (5-shot)Mean IoU88.8DCAMA (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)FB-IoU76.9DCAMA (Swin-B)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU58.3DCAMA (Swin-B)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)FB-IoU73.3DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU51.9DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)learnable parameters (million)47.7DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)FB-IoU71.7DCAMA (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU48.3DCAMA (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)learnable parameters (million)47.7DCAMA (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (2-way 1-shot)mIoU31.7DCAMA (Swin-B)
Few-Shot Semantic SegmentationFSS-1000 (1-shot)FB-IoU93.8DCAMA (Swin-B)
Few-Shot Semantic SegmentationFSS-1000 (1-shot)Mean IoU90.1DCAMA (Swin-B)
Few-Shot Semantic SegmentationFSS-1000 (1-shot)FB-IoU92.4DCAMA (ResNet-101)
Few-Shot Semantic SegmentationFSS-1000 (1-shot)Mean IoU88.3DCAMA (ResNet-101)
Few-Shot Semantic SegmentationFSS-1000 (1-shot)FB-IoU92.5DCAMA (ResNet-50)
Few-Shot Semantic SegmentationFSS-1000 (1-shot)Mean IoU88.2DCAMA (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)FB-IoU78.5DCAMA (Swin-B)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU69.3DCAMA (Swin-B)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)FB-IoU75.7DCAMA (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU64.6DCAMA (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)FB-IoU77.6DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)FB-IoU73.2DCAMA (Swin-B)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU50.9DCAMA (Swin-B)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)FB-IoU69.9DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU43.5DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)learnable parameters (million)47.7DCAMA (ResNet-101)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)FB-IoU69.5DCAMA (ResNet-50)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU43.3DCAMA (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)FB-IoU82.9DCAMA (Swin-B)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU74.9DCAMA (Swin-B)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)FB-IoU79.5DCAMA (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU68.5DCAMA (ResNet-50)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)FB-IoU80.8DCAMA (ResNet-101)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU68.3DCAMA (ResNet-101)
Meta-LearningFSS-1000 (5-shot)FB-IoU94.1DCAMA (Swin-B)
Meta-LearningFSS-1000 (5-shot)Mean IoU90.4DCAMA (Swin-B)
Meta-LearningFSS-1000 (5-shot)FB-IoU93.1DCAMA (ResNet-101)
Meta-LearningFSS-1000 (5-shot)Mean IoU89.1DCAMA (ResNet-101)
Meta-LearningFSS-1000 (5-shot)FB-IoU92.9DCAMA (ResNet-50)
Meta-LearningFSS-1000 (5-shot)Mean IoU88.8DCAMA (ResNet-50)
Meta-LearningCOCO-20i (5-shot)FB-IoU76.9DCAMA (Swin-B)
Meta-LearningCOCO-20i (5-shot)Mean IoU58.3DCAMA (Swin-B)
Meta-LearningCOCO-20i (5-shot)FB-IoU73.3DCAMA (ResNet-101)
Meta-LearningCOCO-20i (5-shot)Mean IoU51.9DCAMA (ResNet-101)
Meta-LearningCOCO-20i (5-shot)learnable parameters (million)47.7DCAMA (ResNet-101)
Meta-LearningCOCO-20i (5-shot)FB-IoU71.7DCAMA (ResNet-50)
Meta-LearningCOCO-20i (5-shot)Mean IoU48.3DCAMA (ResNet-50)
Meta-LearningCOCO-20i (5-shot)learnable parameters (million)47.7DCAMA (ResNet-50)
Meta-LearningCOCO-20i (2-way 1-shot)mIoU31.7DCAMA (Swin-B)
Meta-LearningFSS-1000 (1-shot)FB-IoU93.8DCAMA (Swin-B)
Meta-LearningFSS-1000 (1-shot)Mean IoU90.1DCAMA (Swin-B)
Meta-LearningFSS-1000 (1-shot)FB-IoU92.4DCAMA (ResNet-101)
Meta-LearningFSS-1000 (1-shot)Mean IoU88.3DCAMA (ResNet-101)
Meta-LearningFSS-1000 (1-shot)FB-IoU92.5DCAMA (ResNet-50)
Meta-LearningFSS-1000 (1-shot)Mean IoU88.2DCAMA (ResNet-50)
Meta-LearningPASCAL-5i (1-Shot)FB-IoU78.5DCAMA (Swin-B)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU69.3DCAMA (Swin-B)
Meta-LearningPASCAL-5i (1-Shot)FB-IoU75.7DCAMA (ResNet-50)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU64.6DCAMA (ResNet-50)
Meta-LearningPASCAL-5i (1-Shot)FB-IoU77.6DCAMA (ResNet-101)
Meta-LearningCOCO-20i (1-shot)FB-IoU73.2DCAMA (Swin-B)
Meta-LearningCOCO-20i (1-shot)Mean IoU50.9DCAMA (Swin-B)
Meta-LearningCOCO-20i (1-shot)FB-IoU69.9DCAMA (ResNet-101)
Meta-LearningCOCO-20i (1-shot)Mean IoU43.5DCAMA (ResNet-101)
Meta-LearningCOCO-20i (1-shot)learnable parameters (million)47.7DCAMA (ResNet-101)
Meta-LearningCOCO-20i (1-shot)FB-IoU69.5DCAMA (ResNet-50)
Meta-LearningCOCO-20i (1-shot)Mean IoU43.3DCAMA (ResNet-50)
Meta-LearningPASCAL-5i (5-Shot)FB-IoU82.9DCAMA (Swin-B)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU74.9DCAMA (Swin-B)
Meta-LearningPASCAL-5i (5-Shot)FB-IoU79.5DCAMA (ResNet-50)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU68.5DCAMA (ResNet-50)
Meta-LearningPASCAL-5i (5-Shot)FB-IoU80.8DCAMA (ResNet-101)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU68.3DCAMA (ResNet-101)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17