AutoFocusFormer: Image Segmentation off the Grid

Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

2023-04-24CVPR 2023 1Panoptic Segmentation Segmentation Semantic Segmentation Instance Segmentation Image Segmentation

Abstract

Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tasks such as segmentation. Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task. Since adaptive downsampling generates a set of pixels irregularly distributed on the image plane, we abandon the classic grid structure. Instead, we develop a novel point-based local attention block, facilitated by a balanced clustering module and a learnable neighborhood merging module, which yields representations for our point-based versions of state-of-the-art segmentation heads. Experiments show that our AutoFocusFormer (AFF) improves significantly over baseline models of similar sizes.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Cityscapes val	AP	46.2	AFF-Base (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	PQ	67.7	AFF-Base (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	PQst	71.5	AFF-Base (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	PQth	62.5	AFF-Base (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	mIoU	83	AFF-Base (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	AP	44.2	AFF-Small (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	PQ	66.9	AFF-Small (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	PQst	70.8	AFF-Small (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	PQth	61.5	AFF-Small (single-scale, point-based Mask2Former)
Semantic Segmentation	Cityscapes val	mIoU	82.2	AFF-Small (single-scale, point-based Mask2Former)
Instance Segmentation	Cityscapes val	AP50	74.2	AFF-Base (single-scale, point-based Mask2Former)
Instance Segmentation	Cityscapes val	mask AP	46.2	AFF-Base (single-scale, point-based Mask2Former)
Instance Segmentation	Cityscapes val	AP50	72.8	AFF-Small (single-scale, point-based Mask2Former)
Instance Segmentation	Cityscapes val	mask AP	44	AFF-Small (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	AP	46.2	AFF-Base (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	PQ	67.7	AFF-Base (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	PQst	71.5	AFF-Base (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	PQth	62.5	AFF-Base (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	mIoU	83	AFF-Base (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	AP	44.2	AFF-Small (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	PQ	66.9	AFF-Small (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	PQst	70.8	AFF-Small (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	PQth	61.5	AFF-Small (single-scale, point-based Mask2Former)
10-shot image generation	Cityscapes val	mIoU	82.2	AFF-Small (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	AP	46.2	AFF-Base (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	PQ	67.7	AFF-Base (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	PQst	71.5	AFF-Base (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	PQth	62.5	AFF-Base (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	mIoU	83	AFF-Base (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	AP	44.2	AFF-Small (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	PQ	66.9	AFF-Small (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	PQst	70.8	AFF-Small (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	PQth	61.5	AFF-Small (single-scale, point-based Mask2Former)
Panoptic Segmentation	Cityscapes val	mIoU	82.2	AFF-Small (single-scale, point-based Mask2Former)

AutoFocusFormer: Image Segmentation off the Grid

Abstract

Results

Related Papers

AutoFocusFormer: Image Segmentation off the Grid

Abstract

Results

Related Papers