TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Polarized Self-Attention: Towards High-quality Pixel-wise ...

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang

2021-07-02arXiv preprint 2021 7regressionVocal Bursts Intensity PredictionSegmentationSemantic SegmentationPose EstimationKeypoint Detection2D Pose Estimation
PaperPDFCodeCodeCodeCode(official)CodeCode

Abstract

Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by $2-4$ points, and boosts state-of-the-arts by $1-2$ points on 2D pose estimation and semantic segmentation benchmarks.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCityscapes valmIoU86.93HRNetV2-OCR+PSA
Pose EstimationCOCO test-devAP79.5UDP-Pose-PSA(384x288)
Pose EstimationCOCO test-devAP5093.6UDP-Pose-PSA(384x288)
Pose EstimationCOCO test-devAP7585.9UDP-Pose-PSA(384x288)
Pose EstimationCOCO test-devAPL84.3UDP-Pose-PSA(384x288)
Pose EstimationCOCO test-devAPM76.3UDP-Pose-PSA(384x288)
Pose EstimationCOCO test-devAR81.9UDP-Pose-PSA(384x288)
Pose EstimationCOCO test-devAP78.9UDP-Pose-PSA(256x192)
Pose EstimationCOCO test-devAP5093.6UDP-Pose-PSA(256x192)
Pose EstimationCOCO test-devAP7585.8UDP-Pose-PSA(256x192)
Pose EstimationCOCO test-devAPL83.6UDP-Pose-PSA(256x192)
Pose EstimationCOCO test-devAPM76.1UDP-Pose-PSA(256x192)
Pose EstimationCOCO test-devAR81.4UDP-Pose-PSA(256x192)
Pose EstimationCOCO (Common Objects in Context)Validation AP79.5UDP-Pose-PSA(384x288)
3DCOCO test-devAP79.5UDP-Pose-PSA(384x288)
3DCOCO test-devAP5093.6UDP-Pose-PSA(384x288)
3DCOCO test-devAP7585.9UDP-Pose-PSA(384x288)
3DCOCO test-devAPL84.3UDP-Pose-PSA(384x288)
3DCOCO test-devAPM76.3UDP-Pose-PSA(384x288)
3DCOCO test-devAR81.9UDP-Pose-PSA(384x288)
3DCOCO test-devAP78.9UDP-Pose-PSA(256x192)
3DCOCO test-devAP5093.6UDP-Pose-PSA(256x192)
3DCOCO test-devAP7585.8UDP-Pose-PSA(256x192)
3DCOCO test-devAPL83.6UDP-Pose-PSA(256x192)
3DCOCO test-devAPM76.1UDP-Pose-PSA(256x192)
3DCOCO test-devAR81.4UDP-Pose-PSA(256x192)
3DCOCO (Common Objects in Context)Validation AP79.5UDP-Pose-PSA(384x288)
10-shot image generationCityscapes valmIoU86.93HRNetV2-OCR+PSA
1 Image, 2*2 StitchiCOCO test-devAP79.5UDP-Pose-PSA(384x288)
1 Image, 2*2 StitchiCOCO test-devAP5093.6UDP-Pose-PSA(384x288)
1 Image, 2*2 StitchiCOCO test-devAP7585.9UDP-Pose-PSA(384x288)
1 Image, 2*2 StitchiCOCO test-devAPL84.3UDP-Pose-PSA(384x288)
1 Image, 2*2 StitchiCOCO test-devAPM76.3UDP-Pose-PSA(384x288)
1 Image, 2*2 StitchiCOCO test-devAR81.9UDP-Pose-PSA(384x288)
1 Image, 2*2 StitchiCOCO test-devAP78.9UDP-Pose-PSA(256x192)
1 Image, 2*2 StitchiCOCO test-devAP5093.6UDP-Pose-PSA(256x192)
1 Image, 2*2 StitchiCOCO test-devAP7585.8UDP-Pose-PSA(256x192)
1 Image, 2*2 StitchiCOCO test-devAPL83.6UDP-Pose-PSA(256x192)
1 Image, 2*2 StitchiCOCO test-devAPM76.1UDP-Pose-PSA(256x192)
1 Image, 2*2 StitchiCOCO test-devAR81.4UDP-Pose-PSA(256x192)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Validation AP79.5UDP-Pose-PSA(384x288)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression2025-07-20Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17