TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TransGeo: Transformer Is All You Need for Cross-view Image...

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

Sijie Zhu, Mubarak Shah, Chen Chen

2022-03-31CVPR 2022 1geo-localizationImage-Based LocalizationAll
PaperPDFCode(official)

Abstract

The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. TransGeo takes full advantage of the strengths of transformer related to global information modeling and explicit position information encoding. We further leverage the flexibility of transformer input and propose an attention-guided non-uniform cropping method, so that uninformative image patches are removed with negligible drop on performance to reduce computation cost. The saved computation can be reallocated to increase resolution only for informative patches, resulting in performance improvement with no additional computation cost. This "attend and zoom-in" strategy is highly similar to human behavior when observing images. Remarkably, TransGeo achieves state-of-the-art results on both urban and rural datasets, with significantly less computation cost than CNN-based methods. It does not rely on polar transform and infers faster than CNN-based methods. Code is available at https://github.com/Jeff-Zilence/TransGeo2022.

Results

TaskDatasetMetricValueModel
Object LocalizationcvusaRecall@194.08Transgeo
Object LocalizationcvusaRecall@1099.04Transgeo
Object LocalizationcvusaRecall@598.36Transgeo
Object LocalizationcvusaRecall@top1%99.77Transgeo
Object LocalizationcvactRecall@184.95Transgeo
Object LocalizationcvactRecall@1 (%)98.37Transgeo
Object LocalizationcvactRecall@1095.78Transgeo
Object LocalizationcvactRecall@594.14Transgeo
Object LocalizationVIGOR Cross AreaHit Rate21.21TransGeo
Object LocalizationVIGOR Cross AreaRecall@118.99TransGeo
Object LocalizationVIGOR Cross AreaRecall@1%88.94TransGeo
Object LocalizationVIGOR Cross AreaRecall@1046.91TransGeo
Object LocalizationVIGOR Cross AreaRecall@538.24TransGeo
Object LocalizationVIGOR Same AreaHit Rate73.09TransGeo
Object LocalizationVIGOR Same AreaRecall@161.48TransGeo
Object LocalizationVIGOR Same AreaRecall@1%99.56TransGeo
Object LocalizationVIGOR Same AreaRecall@1091.88TransGeo
Object LocalizationVIGOR Same AreaRecall@587.54TransGeo

Related Papers

Modeling Code: Is Text All You Need?2025-07-15All Eyes, no IMU: Learning Flight Attitude from Vision Alone2025-07-15Is Diversity All You Need for Scalable Robotic Manipulation?2025-07-08DESIGN AND IMPLEMENTATION OF ONLINE CLEARANCE REPORT.2025-07-07Grid-Reg: Grid-Based SAR and Optical Image Registration Across Platforms2025-07-06Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models2025-07-03Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-rays2025-07-01State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30