TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Billion-scale Foundation Model for Remote Sensing Images

A Billion-scale Foundation Model for Remote Sensing Images

Keumgang Cha, Junghoon Seo, Taekyung Lee

2023-04-11Object Detection In Aerial ImagesSemantic Segmentationobject-detectionObject Detection
PaperPDF

Abstract

As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.

Results

TaskDatasetMetricValueModel
Semantic SegmentationLoveDACategory mIoU54.4ViT-G12X4
Semantic SegmentationISPRS PotsdamMean F192.12ViT-G12X4
Semantic SegmentationISPRS PotsdamOverall Accuracy92.58ViT-G12X4
Object DetectionDIOR-RmAP73.6ViT-G12X4
3DDIOR-RmAP73.6ViT-G12X4
2D ClassificationDIOR-RmAP73.6ViT-G12X4
2D Object DetectionDIOR-RmAP73.6ViT-G12X4
10-shot image generationLoveDACategory mIoU54.4ViT-G12X4
10-shot image generationISPRS PotsdamMean F192.12ViT-G12X4
10-shot image generationISPRS PotsdamOverall Accuracy92.58ViT-G12X4
16kDIOR-RmAP73.6ViT-G12X4

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17