Gabriele Berton, Carlo Masone, Barbara Caputo
Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Visual Place Recognition | Nardo-Air R | Recall@1 | 91.55 | CosPlace |
| Visual Place Recognition | Oxford RobotCar Dataset | Recall@1 | 91.1 | CosPlace |
| Visual Place Recognition | SF-XL test v1 | Recall@1 | 64.7 | CosPlace |
| Visual Place Recognition | SF-XL test v1 | Recall@10 | 76.6 | CosPlace |
| Visual Place Recognition | SF-XL test v1 | Recall@5 | 73.3 | CosPlace |
| Visual Place Recognition | Mid-Atlantic Ridge | Recall@1 | 20.79 | CosPlace |
| Visual Place Recognition | St Lucia | Recall@1 | 99.59 | CosPlace |
| Visual Place Recognition | St Lucia | Recall@5 | 99.9 | CosPlace |
| Visual Place Recognition | Pittsburgh-250k-test | Recall@1 | 91.5 | CosPlace |
| Visual Place Recognition | Pittsburgh-250k-test | Recall@10 | 97.9 | CosPlace |
| Visual Place Recognition | Pittsburgh-250k-test | Recall@5 | 96.9 | CosPlace |
| Visual Place Recognition | Hawkins | Recall@1 | 31.36 | CosPlace |
| Visual Place Recognition | Laurel Caverns | Recall@1 | 24.11 | CosPlace |
| Visual Place Recognition | Gardens Point | Recall@1 | 74 | CosPlace |
| Visual Place Recognition | Pittsburgh-30k-test | Recall@1 | 90.45 | CosPlace |
| Visual Place Recognition | Pittsburgh-30k-test | Recall@1 | 90.4 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | Pittsburgh-30k-test | Recall@5 | 95.7 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | Tokyo247 | Recall@1 | 82.2 | CosPlace |
| Visual Place Recognition | Tokyo247 | Recall@10 | 96.5 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | Tokyo247 | Recall@5 | 95.9 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | SF-XL test v2 | Recall@1 | 83.4 | CosPlace |
| Visual Place Recognition | SF-XL test v2 | Recall@10 | 94.1 | CosPlace |
| Visual Place Recognition | SF-XL test v2 | Recall@5 | 91.6 | CosPlace |
| Visual Place Recognition | VP-Air | Recall@1 | 8.12 | CosPlace |
| Visual Place Recognition | Mapillary val | Recall@1 | 86.7 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | Mapillary val | Recall@10 | 93.4 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | Mapillary val | Recall@5 | 92.1 | CosPlace (ResNet-101 2048-D) |
| Visual Place Recognition | Mapillary val | Recall@10 | 91.8 | CosPlace |
| Visual Place Recognition | Mapillary val | Recall@5 | 89.9 | CosPlace |
| Visual Place Recognition | 17 Places | Recall@1 | 61.08 | CosPlace |
| Visual Place Recognition | MSLS | Recall@1 | 79.6 | CosPlace |
| Visual Place Recognition | Baidu Mall | Recall@1 | 41.62 | CosPlace |