TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MTP: Advancing Remote Sensing Foundation Model via Multi-T...

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang

2024-03-20Scene ClassificationImage ClassificationChange detection for remote sensing imagesObject Detection In Aerial ImagesSelf-Supervised LearningSegmentationSemantic SegmentationBuilding change detection for remote sensing imagesInstance SegmentationOriented Object DetectionAerial Scene ClassificationChange Detectionobject-detectionObject Detection
PaperPDFCode(official)Code

Abstract

Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.

Results

TaskDatasetMetricValueModel
Semantic SegmentationLoveDACategory mIoU54.17MAE+MTP(ViT-L+RVSA)
Semantic SegmentationLoveDACategory mIoU54.17IMP+MTP(InternImage-XL)
Semantic SegmentationLoveDACategory mIoU52.39MAE+MTP(ViT-B+RVSA)
Semantic SegmentationSpaceNet 1Mean IoU79.69MAE+MTP(ViT-L)
Semantic SegmentationSpaceNet 1Mean IoU79.63MAE+MTP(ViT-B+RVSA)
Semantic SegmentationSpaceNet 1Mean IoU79.54MAE+MTP(ViT-L+RVSA)
Semantic SegmentationSpaceNet 1Mean IoU79.16IMP+MTP(InternImage-XL)
Remote SensingCDD Dataset (season-varying)F1-Score0.9833IMP+MTP(InternImage-XL)
Remote SensingCDD Dataset (season-varying)F1-Score0.9798MAE+MTP(ViT-L+RVSA)
Remote SensingCDD Dataset (season-varying)F1-Score0.9787MAE+MTP(ViT-B+RVSA)
Remote SensingLEVIR-CDF192.67MAE+MTP(ViT-L+RVSA)
Remote SensingLEVIR-CDParams(M)305MAE+MTP(ViT-L+RVSA)
Remote SensingLEVIR-CDF192.54IMP+MTP(InternImage-XL)
Remote SensingLEVIR-CDParams(M)335IMP+MTP(InternImage-XL)
Remote SensingLEVIR-CDF192.22MAE+MTP(ViT-B+RVSA)
Remote SensingLEVIR-CDParams(M)86MAE+MTP(ViT-B+RVSA)
Object DetectionDIORAP5081.1MAE+MTP(ViT-L+RVSA)
Object DetectionDIORAP5079.4MAE+MTP(ViT-B+RVSA)
Object DetectionDIORAP5078IMP+MTP(InternImage-XL)
Object DetectionFAIR1M-2.0mAP53MAE+MTP(ViT-L+RVSA)
Object DetectionFAIR1M-2.0mAP51.92MAE+MTP(ViT-B+RVSA)
Object DetectionFAIR1M-2.0mAP50.93IMP+MTP(InternImage-XL)
Object DetectionxViewAP5019.4MAE+MTP(ViT-L+RVSA)
Object DetectionxViewAP5018.2IMP+MTP(InternImage-XL)
Object DetectionxViewAP5016.4MAE+MTP(ViT-B+RVSA)
Object DetectionDIOR-RmAP74.54MAE+MTP(ViT-L+RVSA)
Object DetectionDIOR-RmAP72.17IMP+MTP(InternImage-XL)
Object DetectionDIOR-RmAP71.29MAE+MTP(ViT-B+RVSA)
Image ClassificationEuroSATAccuracy (%)99.24IMP+MTP(IntenImage-XL)
Image ClassificationEuroSATAccuracy (%)98.78MAE+MTP(ViT-L+RVSA)
Image ClassificationEuroSATAccuracy (%)98.76MAE+MTP(ViT-B+RVSA)
3DDIORAP5081.1MAE+MTP(ViT-L+RVSA)
3DDIORAP5079.4MAE+MTP(ViT-B+RVSA)
3DDIORAP5078IMP+MTP(InternImage-XL)
3DFAIR1M-2.0mAP53MAE+MTP(ViT-L+RVSA)
3DFAIR1M-2.0mAP51.92MAE+MTP(ViT-B+RVSA)
3DFAIR1M-2.0mAP50.93IMP+MTP(InternImage-XL)
3DxViewAP5019.4MAE+MTP(ViT-L+RVSA)
3DxViewAP5018.2IMP+MTP(InternImage-XL)
3DxViewAP5016.4MAE+MTP(ViT-B+RVSA)
3DDIOR-RmAP74.54MAE+MTP(ViT-L+RVSA)
3DDIOR-RmAP72.17IMP+MTP(InternImage-XL)
3DDIOR-RmAP71.29MAE+MTP(ViT-B+RVSA)
2D ClassificationDIORAP5081.1MAE+MTP(ViT-L+RVSA)
2D ClassificationDIORAP5079.4MAE+MTP(ViT-B+RVSA)
2D ClassificationDIORAP5078IMP+MTP(InternImage-XL)
2D ClassificationFAIR1M-2.0mAP53MAE+MTP(ViT-L+RVSA)
2D ClassificationFAIR1M-2.0mAP51.92MAE+MTP(ViT-B+RVSA)
2D ClassificationFAIR1M-2.0mAP50.93IMP+MTP(InternImage-XL)
2D ClassificationxViewAP5019.4MAE+MTP(ViT-L+RVSA)
2D ClassificationxViewAP5018.2IMP+MTP(InternImage-XL)
2D ClassificationxViewAP5016.4MAE+MTP(ViT-B+RVSA)
2D ClassificationDIOR-RmAP74.54MAE+MTP(ViT-L+RVSA)
2D ClassificationDIOR-RmAP72.17IMP+MTP(InternImage-XL)
2D ClassificationDIOR-RmAP71.29MAE+MTP(ViT-B+RVSA)
Change DetectionGVLMF189.9MTP (ViT-B + RVSA)
Change DetectionCLCDF180.3MTP (ViT-B + RVSA)
Change DetectionEGY-BCDF185.9MTP (VIT-B+RVSA)
Change DetectionWHU Building DatasetF1-score0.9559IMP+MTP(InternImage-XL)
Change DetectionWHU Building DatasetF1-score0.9475MAE+MTP(ViT-L+RVSA)
Change DetectionWHU Building DatasetF1-score0.9432MAE+MTP(ViT-B+RVSA)
Change DetectionLEVIR-CDF192.67MAE+MTP(ViT-L+RVSA)
Change DetectionLEVIR-CDF192.54IMP+MTP(InternImage-XL)
Change DetectionLEVIR-CDF192.22MAE+MTP(ViT-B+RVSA)
Change DetectionOSCD - 3chF155.92MAE+MTP(ViT-L+RVSA)
Change DetectionOSCD - 3chF155.61IMP+MTP(InternImage-XL)
Change DetectionOSCD - 3chF153.36MAE+MTP(ViT-B+RVSA)
Change DetectionCDD Dataset (season-varying)F1-Score98.33IMP+MTP(InternImage-XL)
Change DetectionCDD Dataset (season-varying)F1-Score97.98MAE+MTP(ViT-L+RVSA)
Change DetectionCDD Dataset (season-varying)F1-Score97.87MAE+MTP(ViT-B+RVSA)
2D Object DetectionDIORAP5081.1MAE+MTP(ViT-L+RVSA)
2D Object DetectionDIORAP5079.4MAE+MTP(ViT-B+RVSA)
2D Object DetectionDIORAP5078IMP+MTP(InternImage-XL)
2D Object DetectionFAIR1M-2.0mAP53MAE+MTP(ViT-L+RVSA)
2D Object DetectionFAIR1M-2.0mAP51.92MAE+MTP(ViT-B+RVSA)
2D Object DetectionFAIR1M-2.0mAP50.93IMP+MTP(InternImage-XL)
2D Object DetectionxViewAP5019.4MAE+MTP(ViT-L+RVSA)
2D Object DetectionxViewAP5018.2IMP+MTP(InternImage-XL)
2D Object DetectionxViewAP5016.4MAE+MTP(ViT-B+RVSA)
2D Object DetectionDIOR-RmAP74.54MAE+MTP(ViT-L+RVSA)
2D Object DetectionDIOR-RmAP72.17IMP+MTP(InternImage-XL)
2D Object DetectionDIOR-RmAP71.29MAE+MTP(ViT-B+RVSA)
10-shot image generationLoveDACategory mIoU54.17MAE+MTP(ViT-L+RVSA)
10-shot image generationLoveDACategory mIoU54.17IMP+MTP(InternImage-XL)
10-shot image generationLoveDACategory mIoU52.39MAE+MTP(ViT-B+RVSA)
10-shot image generationSpaceNet 1Mean IoU79.69MAE+MTP(ViT-L)
10-shot image generationSpaceNet 1Mean IoU79.63MAE+MTP(ViT-B+RVSA)
10-shot image generationSpaceNet 1Mean IoU79.54MAE+MTP(ViT-L+RVSA)
10-shot image generationSpaceNet 1Mean IoU79.16IMP+MTP(InternImage-XL)
16kDIORAP5081.1MAE+MTP(ViT-L+RVSA)
16kDIORAP5079.4MAE+MTP(ViT-B+RVSA)
16kDIORAP5078IMP+MTP(InternImage-XL)
16kFAIR1M-2.0mAP53MAE+MTP(ViT-L+RVSA)
16kFAIR1M-2.0mAP51.92MAE+MTP(ViT-B+RVSA)
16kFAIR1M-2.0mAP50.93IMP+MTP(InternImage-XL)
16kxViewAP5019.4MAE+MTP(ViT-L+RVSA)
16kxViewAP5018.2IMP+MTP(InternImage-XL)
16kxViewAP5016.4MAE+MTP(ViT-B+RVSA)
16kDIOR-RmAP74.54MAE+MTP(ViT-L+RVSA)
16kDIOR-RmAP72.17IMP+MTP(InternImage-XL)
16kDIOR-RmAP71.29MAE+MTP(ViT-B+RVSA)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17