Where in the World is this Image? Transformer-based Geo-localization in the Wild

Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa

2022-04-29geo-localization Scene Recognition Semantic Segmentation Photo geolocation estimation

Abstract

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

Results

Task	Dataset	Metric	Value	Model
Image Classification	Im2GPS3k	City level (25 km)	31.1	Translocator
Image Classification	Im2GPS3k	Continent level (2500 km)	80.1	Translocator
Image Classification	Im2GPS3k	Country level (750 km)	58.9	Translocator
Image Classification	Im2GPS3k	Region level (200 km)	46.7	Translocator
Image Classification	Im2GPS3k	Street level (1 km)	11.8	Translocator
Image Classification	YFCC26k	City level (25 km)	17.8	Translocator
Image Classification	YFCC26k	Continent level (2500 km)	60.6	Translocator
Image Classification	YFCC26k	Country level (750 km)	41.3	Translocator
Image Classification	YFCC26k	Region level (200 km)	28	Translocator
Image Classification	YFCC26k	Street level (1 km)	7.2	Translocator
Image Classification	GWS15k	City level (25 km)	1.1	Translocator
Image Classification	GWS15k	Continent level (2500 km)	48.3	Translocator
Image Classification	GWS15k	Country level (750 km)	25.5	Translocator
Image Classification	GWS15k	Region level (200 km)	8	Translocator
Image Classification	GWS15k	Street level (1 km)	0.5	Translocator
4K 60Fps	Im2GPS3k	City level (25 km)	31.1	Translocator
4K 60Fps	Im2GPS3k	Continent level (2500 km)	80.1	Translocator
4K 60Fps	Im2GPS3k	Country level (750 km)	58.9	Translocator
4K 60Fps	Im2GPS3k	Region level (200 km)	46.7	Translocator
4K 60Fps	Im2GPS3k	Street level (1 km)	11.8	Translocator
4K 60Fps	YFCC26k	City level (25 km)	17.8	Translocator
4K 60Fps	YFCC26k	Continent level (2500 km)	60.6	Translocator
4K 60Fps	YFCC26k	Country level (750 km)	41.3	Translocator
4K 60Fps	YFCC26k	Region level (200 km)	28	Translocator
4K 60Fps	YFCC26k	Street level (1 km)	7.2	Translocator
4K 60Fps	GWS15k	City level (25 km)	1.1	Translocator
4K 60Fps	GWS15k	Continent level (2500 km)	48.3	Translocator
4K 60Fps	GWS15k	Country level (750 km)	25.5	Translocator
4K 60Fps	GWS15k	Region level (200 km)	8	Translocator
4K 60Fps	GWS15k	Street level (1 km)	0.5	Translocator

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Classification	Im2GPS3k	City level (25 km)	31.1	Translocator
Image Classification	Im2GPS3k	Continent level (2500 km)	80.1	Translocator
Image Classification	Im2GPS3k	Country level (750 km)	58.9	Translocator
Image Classification	Im2GPS3k	Region level (200 km)	46.7	Translocator
Image Classification	Im2GPS3k	Street level (1 km)	11.8	Translocator
Image Classification	YFCC26k	City level (25 km)	17.8	Translocator
Image Classification	YFCC26k	Continent level (2500 km)	60.6	Translocator
Image Classification	YFCC26k	Country level (750 km)	41.3	Translocator
Image Classification	YFCC26k	Region level (200 km)	28	Translocator
Image Classification	YFCC26k	Street level (1 km)	7.2	Translocator
Image Classification	GWS15k	City level (25 km)	1.1	Translocator
Image Classification	GWS15k	Continent level (2500 km)	48.3	Translocator
Image Classification	GWS15k	Country level (750 km)	25.5	Translocator
Image Classification	GWS15k	Region level (200 km)	8	Translocator
Image Classification	GWS15k	Street level (1 km)	0.5	Translocator
4K 60Fps	Im2GPS3k	City level (25 km)	31.1	Translocator
4K 60Fps	Im2GPS3k	Continent level (2500 km)	80.1	Translocator
4K 60Fps	Im2GPS3k	Country level (750 km)	58.9	Translocator
4K 60Fps	Im2GPS3k	Region level (200 km)	46.7	Translocator
4K 60Fps	Im2GPS3k	Street level (1 km)	11.8	Translocator
4K 60Fps	YFCC26k	City level (25 km)	17.8	Translocator
4K 60Fps	YFCC26k	Continent level (2500 km)	60.6	Translocator
4K 60Fps	YFCC26k	Country level (750 km)	41.3	Translocator
4K 60Fps	YFCC26k	Region level (200 km)	28	Translocator
4K 60Fps	YFCC26k	Street level (1 km)	7.2	Translocator
4K 60Fps	GWS15k	City level (25 km)	1.1	Translocator
4K 60Fps	GWS15k	Continent level (2500 km)	48.3	Translocator
4K 60Fps	GWS15k	Country level (750 km)	25.5	Translocator
4K 60Fps	GWS15k	Region level (200 km)	8	Translocator
4K 60Fps	GWS15k	Street level (1 km)	0.5	Translocator

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Abstract

Results

Related Papers

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Abstract

Results

Related Papers