Papers With Code 2 | ML Benchmarks, SotA Results & Code

OpenStreetView-5M establishes a new open benchmark for geolocation by providing a large, open, and clean dataset. As detailed below, OpenStreetView-5M improves upon several limitations of current geolocation datasets.

Deep neural networks have historically been selected over other machine learning methods because they benefit from larger amounts of data. OSV-5M consists of 4,894,685 training and 210,122 test images, with a height of $512$ pixels and an average width of $792\pm127$ pixels.

Many geolocation datasets are restricted to a few cities or are significantly biased towards the Western world . In contrast, OpenStreetView-5M images are uniformly sampled on the globe, covering 70k cities and 225 countries and territories. The distribution of test images across countries has a normalized entropy of $0.78$ ~\cite[Eq. 19]{wilcox1967indices}, suggesting high diversity. Our train set has a normalized entropy of $0.67$ , which is comparable to the entropy of the distribution of the countries' area ( $0.71$ ).

OpenStreetView-5M is based on the crowd-sourced street view images of Mapillary which follow the CC-BY-SA license: free of use with attribution.

We estimate through manual inspection of 4500 images that 96.1% (±0.57%) of the images in the OpenStreetView-5M dataset are localizable, with a 95% confidence level. Among the weakly or non-localizable images, $70$ % ( $2.7$ % total) are low-quality: under- or over-exposed, blurry, or rotated; $30$ % ( $1.2$ % total) are poorly framed, indoor, or in tunnels.

Without carefully enforcing the spatial separation between train and test images, geolocation can reduce to place-recognition. As our goal is to assess the capacity of models to learn robust geographical representations, we ensure that no image in the OSV-5M training set lies within a $1$ km radius of any image in the test set.

Street-view images are typically acquired by a limited number of camera sensors mounted on the top or front of a small fleet of vehicles assigned to a given region. This correlation between location, cars, and sensors can be exploited to simplify the geolocation task. Notoriously, players of the web-based geolocation game GeoGuessr can locate images from Ghana by spotting a piece of duct tape placed on the corner of the roof rack of the Google Street View car . OpenStreetView-5M tries to avoid this pitfall by ensuring that no image sequence (a continuous series of images acquired by the same user) appears in both training and test sets. While this might not prevent images taken with the same vehicle on different days from being in both sets, it limits such occurrences.

Rich metadata beyond geographical coordinates can improve the robustness and versatility of geolocation models. Each image in our dataset is associated with four tiers of administrative data: country, region (\eg, state), area (\eg, county), and the nearest city. Note that areas are not defined for one-third of the dataset.

We also associate each image with a set of additional information: land cover, climate, soil type, the driving side, and distance to the sea where the image was taken.

OpenStreetView-5M is based on the crowd-sourced street view images of Mapillary which follow the CC-BY-SA license: free of use with attribution.

We also associate each image with a set of additional information: land cover, climate, soil type, the driving side, and distance to the sea where the image was taken.