Tavis Shore, Simon Hadfield, Oscar Mendez
Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360{\deg} field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70{\deg} crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Camera Localization | CVUSA 90 | Top-1 | 33.66 | DSM |
| Camera Localization | CVUSA 90 | Top-1 | 32.11 | BEV-CV |
| Camera Localization | CVUSA 90 | Top-1% | 92.99 | BEV-CV |
| Camera Localization | CVUSA 90 | Top-10 | 69.06 | BEV-CV |
| Camera Localization | CVUSA 90 | Top-5 | 58.36 | BEV-CV |
| Camera Localization | CVUSA 90 | Top-1 | 25.21 | L2LTR |
| Camera Localization | CVUSA 90 | Top-1 | 22.54 | GAL |
| Camera Localization | CVUSA 90 | Top-1 | 21.96 | TransGeo [Zhu2022TransGeoTI] |
| Camera Localization | CVUSA 90 | Top-1 | 15.21 | GeoDTR |
| Camera Localization | CVUSA 90 | Top-10 | 52.27 | GeoDTR |
| Camera Localization | CVUSA 90 | Top-5 | 39.32 | GeoDTR |
| Camera Localization | CVUSA 90 | Top-1 | 4.8 | CVFT |
| Camera Localization | CVUSA 90 | Top-1 | 2.76 | CVM |
| Camera Localization | CVUSA 90 | Top-1% | 88.72 | GeoDTR [zhang2023crossview] |
| Camera Localization | CVUSA 90 | Top-1% | 86.8 | TransGeo |
| Camera Localization | CVUSA 90 | Top-10 | 56.49 | TransGeo |
| Camera Localization | CVUSA 90 | Top-5 | 45.35 | TransGeo |
| Camera Localization | CVUSA 90 | R@5 | 51.9 | L2LTR [Yang2021CrossviewGW] |
| Camera Localization | CVUSA 90 | R@5 | 51.7 | DSM [Shi2020WhereAI] |
| Camera Localization | CVUSA 70 | Top-1 | 27.4 | BEV-CV |
| Camera Localization | CVUSA 70 | Top-1% | 90.94 | BEV-CV |
| Camera Localization | CVUSA 70 | Top-10 | 64.47 | BEV-CV |
| Camera Localization | CVUSA 70 | Top-5 | 52.94 | BEV-CV |