TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Where in the World is this Image? Transformer-based Geo-lo...

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa

2022-04-29geo-localizationScene RecognitionSemantic SegmentationPhoto geolocation estimation
PaperPDFCode(official)

Abstract

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

Results

TaskDatasetMetricValueModel
Image ClassificationIm2GPS3kCity level (25 km)31.1Translocator
Image ClassificationIm2GPS3kContinent level (2500 km)80.1Translocator
Image ClassificationIm2GPS3kCountry level (750 km)58.9Translocator
Image ClassificationIm2GPS3kRegion level (200 km)46.7Translocator
Image ClassificationIm2GPS3kStreet level (1 km)11.8Translocator
Image ClassificationYFCC26kCity level (25 km)17.8Translocator
Image ClassificationYFCC26kContinent level (2500 km)60.6Translocator
Image ClassificationYFCC26kCountry level (750 km)41.3Translocator
Image ClassificationYFCC26kRegion level (200 km)28Translocator
Image ClassificationYFCC26kStreet level (1 km)7.2Translocator
Image ClassificationGWS15kCity level (25 km)1.1Translocator
Image ClassificationGWS15kContinent level (2500 km)48.3Translocator
Image ClassificationGWS15kCountry level (750 km)25.5Translocator
Image ClassificationGWS15kRegion level (200 km)8Translocator
Image ClassificationGWS15kStreet level (1 km)0.5Translocator
4K 60FpsIm2GPS3kCity level (25 km)31.1Translocator
4K 60FpsIm2GPS3kContinent level (2500 km)80.1Translocator
4K 60FpsIm2GPS3kCountry level (750 km)58.9Translocator
4K 60FpsIm2GPS3kRegion level (200 km)46.7Translocator
4K 60FpsIm2GPS3kStreet level (1 km)11.8Translocator
4K 60FpsYFCC26kCity level (25 km)17.8Translocator
4K 60FpsYFCC26kContinent level (2500 km)60.6Translocator
4K 60FpsYFCC26kCountry level (750 km)41.3Translocator
4K 60FpsYFCC26kRegion level (200 km)28Translocator
4K 60FpsYFCC26kStreet level (1 km)7.2Translocator
4K 60FpsGWS15kCity level (25 km)1.1Translocator
4K 60FpsGWS15kContinent level (2500 km)48.3Translocator
4K 60FpsGWS15kCountry level (750 km)25.5Translocator
4K 60FpsGWS15kRegion level (200 km)8Translocator
4K 60FpsGWS15kStreet level (1 km)0.5Translocator

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15