Cost Aggregation Is All You Need for Few-Shot Segmentation

Sunghwan Hong, Seokju Cho, Jisu Nam, Seungryong Kim

2021-12-22Semantic correspondence Segmentation Few-Shot Semantic Segmentation All

Abstract

We introduce a novel cost aggregation network, dubbed Volumetric Aggregation with Transformers (VAT), to tackle the few-shot segmentation task by using both convolutions and transformers to efficiently handle high dimensional correlation maps between query and support. In specific, we propose our encoder consisting of volume embedding module to not only transform the correlation maps into more tractable size but also inject some convolutional inductive bias and volumetric transformer module for the cost aggregation. Our encoder has a pyramidal structure to let the coarser level aggregation to guide the finer level and enforce to learn complementary matching scores. We then feed the output into our affinity-aware decoder along with the projected feature maps for guiding the segmentation process. Combining these components, we conduct experiments to demonstrate the effectiveness of the proposed method, and our method sets a new state-of-the-art for all the standard benchmarks in few-shot segmentation task. Furthermore, we find that the proposed method attains state-of-the-art performance even for the standard benchmarks in semantic correspondence task although not specifically designed for this task. We also provide an extensive ablation study to validate our architectural choices. The trained weights and codes are available at: https://seokju-cho.github.io/VAT/.

Results

Task	Dataset	Metric	Value	Model
Few-Shot Learning	FSS-1000 (5-shot)	Mean IoU	90.6	VAT
Few-Shot Learning	COCO-20i (5-shot)	Mean IoU	47.9	VAT (ResNet-50)
Few-Shot Learning	FSS-1000 (1-shot)	Mean IoU	90	VAT
Few-Shot Learning	PASCAL-5i (1-Shot)	Mean IoU	67.5	VAT
Few-Shot Learning	COCO-20i (1-shot)	Mean IoU	41.3	VAT (ResNet-50)
Few-Shot Learning	PASCAL-5i (5-Shot)	Mean IoU	71.6	VAT
Image Matching	SPair-71k	PCK	54.2	VAT
Image Matching	PF-PASCAL	PCK	92.3	VAT
Image Matching	PF-WILLOW	PCK	81	VAT
Few-Shot Semantic Segmentation	FSS-1000 (5-shot)	Mean IoU	90.6	VAT
Few-Shot Semantic Segmentation	COCO-20i (5-shot)	Mean IoU	47.9	VAT (ResNet-50)
Few-Shot Semantic Segmentation	FSS-1000 (1-shot)	Mean IoU	90	VAT
Few-Shot Semantic Segmentation	PASCAL-5i (1-Shot)	Mean IoU	67.5	VAT
Few-Shot Semantic Segmentation	COCO-20i (1-shot)	Mean IoU	41.3	VAT (ResNet-50)
Few-Shot Semantic Segmentation	PASCAL-5i (5-Shot)	Mean IoU	71.6	VAT
Meta-Learning	FSS-1000 (5-shot)	Mean IoU	90.6	VAT
Meta-Learning	COCO-20i (5-shot)	Mean IoU	47.9	VAT (ResNet-50)
Meta-Learning	FSS-1000 (1-shot)	Mean IoU	90	VAT
Meta-Learning	PASCAL-5i (1-Shot)	Mean IoU	67.5	VAT
Meta-Learning	COCO-20i (1-shot)	Mean IoU	41.3	VAT (ResNet-50)
Meta-Learning	PASCAL-5i (5-Shot)	Mean IoU	71.6	VAT
Semantic correspondence	SPair-71k	PCK	54.2	VAT
Semantic correspondence	PF-PASCAL	PCK	92.3	VAT
Semantic correspondence	PF-WILLOW	PCK	81	VAT

Cost Aggregation Is All You Need for Few-Shot Segmentation

Abstract

Results

Related Papers

Cost Aggregation Is All You Need for Few-Shot Segmentation

Abstract

Results

Related Papers