TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Generalized Zero-Shot Learners for Open-Domain Im...

Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

Lukas Haas, Silas Alberti, Michal Skreta

2023-02-01Meta-LearningGeneralized Zero-Shot LearningPhoto geolocation estimationZero-Shot Learning
PaperPDFCode(official)

Abstract

Image geolocalization is the challenging task of predicting the geographic coordinates of origin for a given photo. It is an unsolved problem relying on the ability to combine visual clues with general knowledge about the world to make accurate predictions across geographies. We present $\href{https://huggingface.co/geolocal/StreetCLIP}{\text{StreetCLIP}}$, a robust, publicly available foundation model not only achieving state-of-the-art performance on multiple open-domain image geolocalization benchmarks but also doing so in a zero-shot setting, outperforming supervised models trained on more than 4 million images. Our method introduces a meta-learning approach for generalized zero-shot learning by pretraining CLIP from synthetic captions, grounding CLIP in a domain of choice. We show that our method effectively transfers CLIP's generalized zero-shot capabilities to the domain of image geolocalization, improving in-domain generalized zero-shot performance without finetuning StreetCLIP on a fixed set of classes.

Results

TaskDatasetMetricValueModel
Image ClassificationIm2GPS3kCity level (25 km)22.4StreetCLIP (Zero-Shot)
Image ClassificationIm2GPS3kContinent level (2500 km)80.4StreetCLIP (Zero-Shot)
Image ClassificationIm2GPS3kCountry level (750 km)61.3StreetCLIP (Zero-Shot)
Image ClassificationIm2GPS3kRegion level (200 km)37.4StreetCLIP (Zero-Shot)
Image ClassificationIm2GPSCity level (25 km)28.3StreetCLIP (Zero-Shot)
Image ClassificationIm2GPSContinent level (2500 km)88.2StreetCLIP (Zero-Shot)
Image ClassificationIm2GPSCountry level (750 km)74.7StreetCLIP (Zero-Shot)
Image ClassificationIm2GPSRegion level (200 km)45.1StreetCLIP (Zero-Shot)
4K 60FpsIm2GPS3kCity level (25 km)22.4StreetCLIP (Zero-Shot)
4K 60FpsIm2GPS3kContinent level (2500 km)80.4StreetCLIP (Zero-Shot)
4K 60FpsIm2GPS3kCountry level (750 km)61.3StreetCLIP (Zero-Shot)
4K 60FpsIm2GPS3kRegion level (200 km)37.4StreetCLIP (Zero-Shot)
4K 60FpsIm2GPSCity level (25 km)28.3StreetCLIP (Zero-Shot)
4K 60FpsIm2GPSContinent level (2500 km)88.2StreetCLIP (Zero-Shot)
4K 60FpsIm2GPSCountry level (750 km)74.7StreetCLIP (Zero-Shot)
4K 60FpsIm2GPSRegion level (200 km)45.1StreetCLIP (Zero-Shot)

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Imbalanced Regression Pipeline Recommendation2025-07-16CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels2025-07-16Mixture of Experts in Large Language Models2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks2025-07-13