AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le

2023-06-12Segmentation Semantic Segmentation Image Segmentation

Paper PDF Code(official)

Abstract

Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects. To handle these problems, we inherit the advantages of Transformers and propose AerialFormer, which unifies Transformers at the contracting path with lightweight Multi-Dilated Convolutional Neural Networks (MD-CNNs) at the expanding path. Our AerialFormer is designed as a hierarchical structure, in which Transformer encoder outputs multi-scale features and MD-CNNs decoder aggregates information from the multi-scales. Thus, it takes both local and global contexts into consideration to render powerful representations and high-resolution segmentation. We have benchmarked AerialFormer on three common datasets including iSAID, LoveDA, and Potsdam. Comprehensive experiments and extensive ablation studies show that our proposed AerialFormer outperforms previous state-of-the-art methods with remarkable performance. Our source code will be publicly available upon acceptance.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	LoveDA	Category mIoU	54.1	AerialFormer-B
Semantic Segmentation	iSAID	mIoU	69.3	AerialFormer-B
Semantic Segmentation	iSAID	mIoU	68.4	AerialFormer-S
Semantic Segmentation	iSAID	mIoU	67.5	AerialFormer-T
Semantic Segmentation	ISPRS Potsdam	Mean F1	94.1	AerialFormer-B
Semantic Segmentation	ISPRS Potsdam	Mean IoU	89.1	AerialFormer-B
Semantic Segmentation	ISPRS Potsdam	Overall Accuracy	93.9	AerialFormer-B
10-shot image generation	LoveDA	Category mIoU	54.1	AerialFormer-B
10-shot image generation	iSAID	mIoU	69.3	AerialFormer-B
10-shot image generation	iSAID	mIoU	68.4	AerialFormer-S
10-shot image generation	iSAID	mIoU	67.5	AerialFormer-T
10-shot image generation	ISPRS Potsdam	Mean F1	94.1	AerialFormer-B
10-shot image generation	ISPRS Potsdam	Mean IoU	89.1	AerialFormer-B
10-shot image generation	ISPRS Potsdam	Overall Accuracy	93.9	AerialFormer-B

AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Abstract

Results

Related Papers

AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Abstract

Results

Related Papers