Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le
Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects. To handle these problems, we inherit the advantages of Transformers and propose AerialFormer, which unifies Transformers at the contracting path with lightweight Multi-Dilated Convolutional Neural Networks (MD-CNNs) at the expanding path. Our AerialFormer is designed as a hierarchical structure, in which Transformer encoder outputs multi-scale features and MD-CNNs decoder aggregates information from the multi-scales. Thus, it takes both local and global contexts into consideration to render powerful representations and high-resolution segmentation. We have benchmarked AerialFormer on three common datasets including iSAID, LoveDA, and Potsdam. Comprehensive experiments and extensive ablation studies show that our proposed AerialFormer outperforms previous state-of-the-art methods with remarkable performance. Our source code will be publicly available upon acceptance.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | LoveDA | Category mIoU | 54.1 | AerialFormer-B |
| Semantic Segmentation | iSAID | mIoU | 69.3 | AerialFormer-B |
| Semantic Segmentation | iSAID | mIoU | 68.4 | AerialFormer-S |
| Semantic Segmentation | iSAID | mIoU | 67.5 | AerialFormer-T |
| Semantic Segmentation | ISPRS Potsdam | Mean F1 | 94.1 | AerialFormer-B |
| Semantic Segmentation | ISPRS Potsdam | Mean IoU | 89.1 | AerialFormer-B |
| Semantic Segmentation | ISPRS Potsdam | Overall Accuracy | 93.9 | AerialFormer-B |
| 10-shot image generation | LoveDA | Category mIoU | 54.1 | AerialFormer-B |
| 10-shot image generation | iSAID | mIoU | 69.3 | AerialFormer-B |
| 10-shot image generation | iSAID | mIoU | 68.4 | AerialFormer-S |
| 10-shot image generation | iSAID | mIoU | 67.5 | AerialFormer-T |
| 10-shot image generation | ISPRS Potsdam | Mean F1 | 94.1 | AerialFormer-B |
| 10-shot image generation | ISPRS Potsdam | Mean IoU | 89.1 | AerialFormer-B |
| 10-shot image generation | ISPRS Potsdam | Overall Accuracy | 93.9 | AerialFormer-B |