TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ProGEO: Generating Prompts through Image-Text Contrastive ...

ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization

Chen Mao, Jingqi Hu

2024-06-04geo-localizationVisual Place Recognition
PaperPDFCode(official)

Abstract

Visual Geo-localization (VG) refers to the process to identify the location described in query images, which is widely applied in robotics field and computer vision tasks, such as autonomous driving, metaverse, augmented reality, and SLAM. In fine-grained images lacking specific text descriptions, directly applying pure visual methods to represent neighborhood features often leads to the model focusing on overly fine-grained features, unable to fully mine the semantic information in the images. Therefore, we propose a two-stage training method to enhance visual performance and use contrastive learning to mine challenging samples. We first leverage the multi-modal description capability of CLIP (Contrastive Language-Image Pretraining) to create a set of learnable text prompts for each geographic image feature to form vague descriptions. Then, by utilizing dynamic text prompts to assist the training of the image encoder, we enable the image encoder to learn better and more generalizable visual features. This strategy of applying text to purely visual tasks addresses the challenge of using multi-modal models for geographic images, which often suffer from a lack of precise descriptions, making them difficult to utilize widely. We validate the effectiveness of the proposed strategy on several large-scale visual geo-localization datasets, and our method achieves competitive results on multiple visual geo-localization datasets. Our code and model are available at https://github.com/Chain-Mao/ProGEO.

Results

TaskDatasetMetricValueModel
Visual Place RecognitionSF-XL test v1Recall@184.7ProGEO
Visual Place RecognitionSF-XL test v1Recall@590.3ProGEO
Visual Place RecognitionSt LuciaRecall@199.7ProGEO
Visual Place RecognitionSt LuciaRecall@599.9ProGEO
Visual Place RecognitionPittsburgh-250k-testRecall@192.2ProGEO
Visual Place RecognitionPittsburgh-250k-testRecall@597.7ProGEO
Visual Place RecognitionPittsburgh-30k-testRecall@193ProGEO
Visual Place RecognitionPittsburgh-30k-testRecall@598.3ProGEO
Visual Place RecognitionTokyo247Recall@188.6ProGEO
Visual Place RecognitionTokyo247Recall@593.3ProGEO
Visual Place RecognitionSF-XL test v2Recall@193ProGEO
Visual Place RecognitionSF-XL test v2Recall@596.7ProGEO
Visual Place RecognitionMSLSRecall@184.9ProGEO
Visual Place RecognitionMSLSRecall@591.6ProGEO

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20Grid-Reg: Grid-Based SAR and Optical Image Registration Across Platforms2025-07-06Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition2025-07-04Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation2025-06-19Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models2025-06-17A Two-stage Optimization Method for Wide-range Single-electron Quantum Magnetic Sensing2025-06-16Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning2025-06-06HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition2025-06-05