TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Feature-Proxy Transformer for Few-Shot Segmentation

Feature-Proxy Transformer for Few-Shot Segmentation

Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

2022-10-13SegmentationFew-Shot Semantic SegmentationSemantic Segmentation
PaperPDFCode(official)Code(official)

Abstract

Few-shot segmentation (FSS) aims at performing semantic segmentation on novel classes given a few annotated support samples. With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head. Due to the intricacy of the decoder and its matching pipeline, it is not easy to follow such an FSS framework. This paper revives the straightforward framework of "feature extractor $+$ linear classification head" and proposes a novel Feature-Proxy Transformer (FPTrans) method, in which the "proxy" is the vector representing a semantic class in the linear classification head. FPTrans has two keypoints for learning discriminative features and representative proxies: 1) To better utilize the limited support samples, the feature extractor makes the query interact with the support features from the bottom to top layers using a novel prompting strategy. 2) FPTrans uses multiple local background proxies (instead of a single one) because the background is not homogeneous and may contain some novel foreground regions. These two keypoints are easily integrated into the vision transformer backbone with the prompting mechanism in the transformer. Given the learned features and proxies, FPTrans directly compares their cosine similarity for segmentation. Although the framework is straightforward, we show that FPTrans achieves competitive FSS accuracy on par with state-of-the-art decoder-based methods.

Results

TaskDatasetMetricValueModel
Few-Shot LearningCOCO-20i (5-shot)Mean IoU58.9FPTrans (DeiT-B/16)
Few-Shot LearningCOCO-20i (5-shot)Mean IoU53.8FPTrans (ViT-B/16)
Few-Shot LearningCOCO-20i -> Pascal VOC (1-shot)Mean IoU69.7FPTrans (DeiT-B/16)
Few-Shot LearningCOCO-20i -> Pascal VOC (1-shot)Mean IoU67.6FPTrans (ViT-B/16)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU68.8FPTrans (DeiT-B/16)
Few-Shot LearningPASCAL-5i (1-Shot)Mean IoU64.7FPTrans (ViT-B/16)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU47FPTrans (DeiT-B/16)
Few-Shot LearningCOCO-20i (1-shot)Mean IoU42FPTrans (ViT-B/16)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU78FPTrans (DeiT-B/16)
Few-Shot LearningPASCAL-5i (5-Shot)Mean IoU73.7FPTrans (ViT-B/16)
Few-Shot LearningCOCO-20i -> Pascal VOC (5-shot)Mean IoU79.3FPTrans (DeiT-B/16)
Few-Shot LearningCOCO-20i -> Pascal VOC (5-shot)Mean IoU76.9FPTrans (ViT-B/16)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU58.9FPTrans (DeiT-B/16)
Few-Shot Semantic SegmentationCOCO-20i (5-shot)Mean IoU53.8FPTrans (ViT-B/16)
Few-Shot Semantic SegmentationCOCO-20i -> Pascal VOC (1-shot)Mean IoU69.7FPTrans (DeiT-B/16)
Few-Shot Semantic SegmentationCOCO-20i -> Pascal VOC (1-shot)Mean IoU67.6FPTrans (ViT-B/16)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU68.8FPTrans (DeiT-B/16)
Few-Shot Semantic SegmentationPASCAL-5i (1-Shot)Mean IoU64.7FPTrans (ViT-B/16)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU47FPTrans (DeiT-B/16)
Few-Shot Semantic SegmentationCOCO-20i (1-shot)Mean IoU42FPTrans (ViT-B/16)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU78FPTrans (DeiT-B/16)
Few-Shot Semantic SegmentationPASCAL-5i (5-Shot)Mean IoU73.7FPTrans (ViT-B/16)
Few-Shot Semantic SegmentationCOCO-20i -> Pascal VOC (5-shot)Mean IoU79.3FPTrans (DeiT-B/16)
Few-Shot Semantic SegmentationCOCO-20i -> Pascal VOC (5-shot)Mean IoU76.9FPTrans (ViT-B/16)
Meta-LearningCOCO-20i (5-shot)Mean IoU58.9FPTrans (DeiT-B/16)
Meta-LearningCOCO-20i (5-shot)Mean IoU53.8FPTrans (ViT-B/16)
Meta-LearningCOCO-20i -> Pascal VOC (1-shot)Mean IoU69.7FPTrans (DeiT-B/16)
Meta-LearningCOCO-20i -> Pascal VOC (1-shot)Mean IoU67.6FPTrans (ViT-B/16)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU68.8FPTrans (DeiT-B/16)
Meta-LearningPASCAL-5i (1-Shot)Mean IoU64.7FPTrans (ViT-B/16)
Meta-LearningCOCO-20i (1-shot)Mean IoU47FPTrans (DeiT-B/16)
Meta-LearningCOCO-20i (1-shot)Mean IoU42FPTrans (ViT-B/16)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU78FPTrans (DeiT-B/16)
Meta-LearningPASCAL-5i (5-Shot)Mean IoU73.7FPTrans (ViT-B/16)
Meta-LearningCOCO-20i -> Pascal VOC (5-shot)Mean IoU79.3FPTrans (DeiT-B/16)
Meta-LearningCOCO-20i -> Pascal VOC (5-shot)Mean IoU76.9FPTrans (ViT-B/16)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17