TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/AutoFocusFormer: Image Segmentation off the Grid

AutoFocusFormer: Image Segmentation off the Grid

Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

2023-04-24CVPR 2023 1Panoptic SegmentationSegmentationSemantic SegmentationInstance SegmentationImage Segmentation
PaperPDFCode(official)

Abstract

Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tasks such as segmentation. Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task. Since adaptive downsampling generates a set of pixels irregularly distributed on the image plane, we abandon the classic grid structure. Instead, we develop a novel point-based local attention block, facilitated by a balanced clustering module and a learnable neighborhood merging module, which yields representations for our point-based versions of state-of-the-art segmentation heads. Experiments show that our AutoFocusFormer (AFF) improves significantly over baseline models of similar sizes.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCityscapes valAP46.2AFF-Base (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valPQ67.7AFF-Base (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valPQst71.5AFF-Base (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valPQth62.5AFF-Base (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valmIoU83AFF-Base (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valAP44.2AFF-Small (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valPQ66.9AFF-Small (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valPQst70.8AFF-Small (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valPQth61.5AFF-Small (single-scale, point-based Mask2Former)
Semantic SegmentationCityscapes valmIoU82.2AFF-Small (single-scale, point-based Mask2Former)
Instance SegmentationCityscapes valAP5074.2AFF-Base (single-scale, point-based Mask2Former)
Instance SegmentationCityscapes valmask AP46.2AFF-Base (single-scale, point-based Mask2Former)
Instance SegmentationCityscapes valAP5072.8AFF-Small (single-scale, point-based Mask2Former)
Instance SegmentationCityscapes valmask AP44AFF-Small (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valAP46.2AFF-Base (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valPQ67.7AFF-Base (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valPQst71.5AFF-Base (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valPQth62.5AFF-Base (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valmIoU83AFF-Base (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valAP44.2AFF-Small (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valPQ66.9AFF-Small (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valPQst70.8AFF-Small (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valPQth61.5AFF-Small (single-scale, point-based Mask2Former)
10-shot image generationCityscapes valmIoU82.2AFF-Small (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valAP46.2AFF-Base (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valPQ67.7AFF-Base (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valPQst71.5AFF-Base (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valPQth62.5AFF-Base (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valmIoU83AFF-Base (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valAP44.2AFF-Small (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valPQ66.9AFF-Small (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valPQst70.8AFF-Small (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valPQth61.5AFF-Small (single-scale, point-based Mask2Former)
Panoptic SegmentationCityscapes valmIoU82.2AFF-Small (single-scale, point-based Mask2Former)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17