TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Where We Are and What We're Looking At: Query Based Worldw...

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

Brandon Clark, Alec Kerrigan, Parth Parag Kulkarni, Vicente Vivanco Cepeda, Mubarak Shah

2023-03-07CVPR 2023 1geo-localizationImage-Based LocalizationPhoto geolocation estimationMemorization
PaperPDF

Abstract

Determining the exact latitude and longitude that a photo was taken is a useful and widely applicable task, yet it remains exceptionally difficult despite the accelerated progress of other computer vision tasks. Most previous approaches have opted to learn a single representation of query images, which are then classified at different levels of geographic granularity. These approaches fail to exploit the different visual cues that give context to different hierarchies, such as the country, state, and city level. To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention. We achieve this by learning a query for each geographic hierarchy and scene type. Furthermore, we learn a separate representation for different environmental scenes, as different scenes in the same location are often defined by completely different visual features. We achieve state of the art street level accuracy on 4 standard geo-localization datasets : Im2GPS, Im2GPS3k, YFCC4k, and YFCC26k, as well as qualitatively demonstrate how our method learns different representations for different visual hierarchies and scenes, which has not been demonstrated in the previous methods. These previous testing datasets mostly consist of iconic landmarks or images taken from social media, which makes them either a memorization task, or biased towards certain places. To address this issue we introduce a much harder testing dataset, Google-World-Streets-15k, comprised of images taken from Google Streetview covering the whole planet and present state of the art results. Our code will be made available in the camera-ready version.

Results

TaskDatasetMetricValueModel
Image ClassificationIm2GPS3kCity level (25 km)33.5GeoDecoder
Image ClassificationIm2GPS3kContinent level (2500 km)76.1GeoDecoder
Image ClassificationIm2GPS3kCountry level (750 km)61GeoDecoder
Image ClassificationIm2GPS3kRegion level (200 km)45.9GeoDecoder
Image ClassificationIm2GPS3kStreet level (1 km)12.8GeoDecoder
Image ClassificationYFCC26kCity level (25 km)23.9GeoDecoder
Image ClassificationYFCC26kContinent level (2500 km)69GeoDecoder
Image ClassificationYFCC26kCountry level (750 km)49.6GeoDecoder
Image ClassificationYFCC26kRegion level (200 km)34.1GeoDecoder
Image ClassificationYFCC26kStreet level (1 km)10.1GeoDecoder
Image ClassificationGWS15kCity level (25 km)1.5GeoDecoder
Image ClassificationGWS15kContinent level (2500 km)50.5GeoDecoder
Image ClassificationGWS15kCountry level (750 km)26.9GeoDecoder
Image ClassificationGWS15kRegion level (200 km)8.7GeoDecoder
Image ClassificationGWS15kStreet level (1 km)0.7GeoDecoder
4K 60FpsIm2GPS3kCity level (25 km)33.5GeoDecoder
4K 60FpsIm2GPS3kContinent level (2500 km)76.1GeoDecoder
4K 60FpsIm2GPS3kCountry level (750 km)61GeoDecoder
4K 60FpsIm2GPS3kRegion level (200 km)45.9GeoDecoder
4K 60FpsIm2GPS3kStreet level (1 km)12.8GeoDecoder
4K 60FpsYFCC26kCity level (25 km)23.9GeoDecoder
4K 60FpsYFCC26kContinent level (2500 km)69GeoDecoder
4K 60FpsYFCC26kCountry level (750 km)49.6GeoDecoder
4K 60FpsYFCC26kRegion level (200 km)34.1GeoDecoder
4K 60FpsYFCC26kStreet level (1 km)10.1GeoDecoder
4K 60FpsGWS15kCity level (25 km)1.5GeoDecoder
4K 60FpsGWS15kContinent level (2500 km)50.5GeoDecoder
4K 60FpsGWS15kCountry level (750 km)26.9GeoDecoder
4K 60FpsGWS15kRegion level (200 km)8.7GeoDecoder
4K 60FpsGWS15kStreet level (1 km)0.7GeoDecoder

Related Papers

What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs2025-07-08Grid-Reg: Grid-Based SAR and Optical Image Registration Across Platforms2025-07-06MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI2025-06-30Listener-Rewarded Thinking in VLMs for Image Preferences2025-06-28Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test2025-06-26Counterfactual Influence as a Distributional Quantity2025-06-25