Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang

2021-07-02arXiv preprint 2021 7regression Vocal Bursts Intensity Prediction Segmentation Semantic Segmentation Pose Estimation Keypoint Detection 2D Pose Estimation

Paper PDF Code Code Code Code(official)Code Code

Abstract

Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by $2-4$ points, and boosts state-of-the-arts by $1-2$ points on 2D pose estimation and semantic segmentation benchmarks.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Cityscapes val	mIoU	86.93	HRNetV2-OCR+PSA
Pose Estimation	COCO test-dev	AP	79.5	UDP-Pose-PSA(384x288)
Pose Estimation	COCO test-dev	AP50	93.6	UDP-Pose-PSA(384x288)
Pose Estimation	COCO test-dev	AP75	85.9	UDP-Pose-PSA(384x288)
Pose Estimation	COCO test-dev	APL	84.3	UDP-Pose-PSA(384x288)
Pose Estimation	COCO test-dev	APM	76.3	UDP-Pose-PSA(384x288)
Pose Estimation	COCO test-dev	AR	81.9	UDP-Pose-PSA(384x288)
Pose Estimation	COCO test-dev	AP	78.9	UDP-Pose-PSA(256x192)
Pose Estimation	COCO test-dev	AP50	93.6	UDP-Pose-PSA(256x192)
Pose Estimation	COCO test-dev	AP75	85.8	UDP-Pose-PSA(256x192)
Pose Estimation	COCO test-dev	APL	83.6	UDP-Pose-PSA(256x192)
Pose Estimation	COCO test-dev	APM	76.1	UDP-Pose-PSA(256x192)
Pose Estimation	COCO test-dev	AR	81.4	UDP-Pose-PSA(256x192)
Pose Estimation	COCO (Common Objects in Context)	Validation AP	79.5	UDP-Pose-PSA(384x288)
3D	COCO test-dev	AP	79.5	UDP-Pose-PSA(384x288)
3D	COCO test-dev	AP50	93.6	UDP-Pose-PSA(384x288)
3D	COCO test-dev	AP75	85.9	UDP-Pose-PSA(384x288)
3D	COCO test-dev	APL	84.3	UDP-Pose-PSA(384x288)
3D	COCO test-dev	APM	76.3	UDP-Pose-PSA(384x288)
3D	COCO test-dev	AR	81.9	UDP-Pose-PSA(384x288)
3D	COCO test-dev	AP	78.9	UDP-Pose-PSA(256x192)
3D	COCO test-dev	AP50	93.6	UDP-Pose-PSA(256x192)
3D	COCO test-dev	AP75	85.8	UDP-Pose-PSA(256x192)
3D	COCO test-dev	APL	83.6	UDP-Pose-PSA(256x192)
3D	COCO test-dev	APM	76.1	UDP-Pose-PSA(256x192)
3D	COCO test-dev	AR	81.4	UDP-Pose-PSA(256x192)
3D	COCO (Common Objects in Context)	Validation AP	79.5	UDP-Pose-PSA(384x288)
10-shot image generation	Cityscapes val	mIoU	86.93	HRNetV2-OCR+PSA
1 Image, 2*2 Stitchi	COCO test-dev	AP	79.5	UDP-Pose-PSA(384x288)
1 Image, 2*2 Stitchi	COCO test-dev	AP50	93.6	UDP-Pose-PSA(384x288)
1 Image, 2*2 Stitchi	COCO test-dev	AP75	85.9	UDP-Pose-PSA(384x288)
1 Image, 2*2 Stitchi	COCO test-dev	APL	84.3	UDP-Pose-PSA(384x288)
1 Image, 2*2 Stitchi	COCO test-dev	APM	76.3	UDP-Pose-PSA(384x288)
1 Image, 2*2 Stitchi	COCO test-dev	AR	81.9	UDP-Pose-PSA(384x288)
1 Image, 2*2 Stitchi	COCO test-dev	AP	78.9	UDP-Pose-PSA(256x192)
1 Image, 2*2 Stitchi	COCO test-dev	AP50	93.6	UDP-Pose-PSA(256x192)
1 Image, 2*2 Stitchi	COCO test-dev	AP75	85.8	UDP-Pose-PSA(256x192)
1 Image, 2*2 Stitchi	COCO test-dev	APL	83.6	UDP-Pose-PSA(256x192)
1 Image, 2*2 Stitchi	COCO test-dev	APM	76.1	UDP-Pose-PSA(256x192)
1 Image, 2*2 Stitchi	COCO test-dev	AR	81.4	UDP-Pose-PSA(256x192)
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	Validation AP	79.5	UDP-Pose-PSA(384x288)

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Abstract

Results

Related Papers

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Abstract

Results

Related Papers