Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Yifang Yin, Andrei Georgescu, An Tran, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann
Street-view imagery provides us with novel experiences to explore different places remotely. Carefully calibrated street-view images (e.g. Google Street View) can be used for different downstream tasks, e.g. navigation, map features extraction. As personal high-quality cameras have become much more affordable and portable, an enormous amount of crowdsourced street-view images are uploaded to the internet, but commonly with missing or noisy sensor information. To prepare this hidden treasure for "ready-to-use" status, determining missing location information and camera orientation angles are two equally important tasks. Recent methods have achieved high performance on geo-localization of street-view images by cross-view matching with a pool of geo-referenced satellite imagery. However, most of the existing works focus more on geo-localization than estimating the image orientation. In this work, we re-state the importance of finding fine-grained orientation for street-view images, formally define the problem and provide a set of evaluation metrics to assess the quality of the orientation estimation. We propose two methods to improve the granularity of the orientation estimation, achieving 82.4% and 72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSA and CVACT datasets, corresponding to 34.9% and 28.2% absolute improvement compared to previous works. Integrating fine-grained orientation estimation in training also improves the performance on geo-localization, giving top 1 recall 95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the two datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Localization | cvusa | Recall@1 | 95.43 | GeoDTR |
| Object Localization | cvusa | Recall@10 | 99.34 | GeoDTR |
| Object Localization | cvusa | Recall@5 | 98.86 | GeoDTR |
| Object Localization | cvusa | Recall@top1% | 99.86 | GeoDTR |
| Object Localization | cvact | Recall@1 | 86.21 | GeoDTR |
| Object Localization | cvact | Recall@1 (%) | 98.77 | GeoDTR |
| Object Localization | cvact | Recall@10 | 96.72 | GeoDTR |
| Object Localization | cvact | Recall@5 | 95.44 | GeoDTR |