TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Enhancing Remote Sensing Vision-Language Models for Zero-S...

Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification

Karim El Khoury, Maxime Zanella, Benoît Gérin, Tiffanie Godelaine, Benoît Macq, Saïd Mahmoudi, Christophe De Vleeschouwer, Ismail Ben Ayed

2024-09-01Scene ClassificationTransductive Zero-Shot ClassificationZero-Shot Learning
PaperPDFCode(official)

Abstract

Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and making independent predictions, i.e., inductive inference, thereby limiting their effectiveness by ignoring valuable contextual information. Our approach tackles this issue by utilizing initial predictions based on text prompting and patch affinity relationships from the image encoder to enhance zero-shot capabilities through transductive inference, all without the need for supervision and at a minor computational cost. Experiments on 10 remote sensing datasets with state-of-the-art Vision-Language Models demonstrate significant accuracy improvements over inductive zero-shot classification. Our source code is publicly available on Github: https://github.com/elkhouryk/RS-TransCLIP

Results

TaskDatasetMetricValueModel
Zero-Shot LearningEuroSATAccuracy91.2RS-TransCLIP
Zero-Shot LearningRSICB256Accuracy72.8RS-TransCLIP
Zero-Shot LearningOPTIMAL31Accuracy94.5RS-TransCLIP
Zero-Shot LearningWHURS19Accuracy99.7RS-TransCLIP
Zero-Shot LearningPatternNetAccuracy96.2RS-TransCLIP
Zero-Shot LearningRESISC45Accuracy88RS-TransCLIP
Zero-Shot LearningAIDAccuracy92.7RS-TransCLIP
Zero-Shot LearningMLRSNetAccuracy78.1RS-TransCLIP
Zero-Shot LearningRSC11Accuracy88.1RS-TransCLIP
Zero-Shot LearningRSICB128Accuracy54.8RS-TransCLIP

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning2025-06-26Zero-Shot Learning for Obsolescence Risk Forecasting2025-06-26Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition2025-06-25SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement2025-06-23Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation2025-06-20