Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo, Javier Civera

2023-11-27CVPR 2024 1Visual Place Recognition Re-Ranking

Abstract

The task of Visual Place Recognition (VPR) aims to match a query image against references from an extensive database of images from different places, relying solely on visual cues. State-of-the-art pipelines focus on the aggregation of features extracted from a deep backbone, in order to form a global descriptor for each image. In this context, we introduce SALAD (Sinkhorn Algorithm for Locally Aggregated Descriptors), which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem. In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative, enhancing the overall descriptor quality. Additionally, we leverage and fine-tune DINOv2 as a backbone, which provides enhanced description power for the local features, and dramatically reduces the required training time. As a result, our single-stage method not only surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost. Code and models are available at https://github.com/serizba/salad.

Results

Task	Dataset	Metric	Value	Model
Visual Place Recognition	Nordland	Recall@1	85.2	DINOv2 SALAD (1-frame thr.)
Visual Place Recognition	Nordland	Recall@5	98.5	DINOv2 SALAD (1-frame thr.)
Visual Place Recognition	Pittsburgh-250k-test	Recall@1	95.1	DINOv2 SALAD
Visual Place Recognition	Pittsburgh-250k-test	Recall@10	99.1	DINOv2 SALAD
Visual Place Recognition	Pittsburgh-250k-test	Recall@5	98.5	DINOv2 SALAD
Visual Place Recognition	SPED	Recall@1	92.1	DINOv2 SALAD
Visual Place Recognition	SPED	Recall@10	96.5	DINOv2 SALAD
Visual Place Recognition	SPED	Recall@5	96.2	DINOv2 SALAD
Visual Place Recognition	Mapillary val	Recall@1	92.2	DINOv2 SALAD
Visual Place Recognition	Mapillary val	Recall@10	97	DINOv2 SALAD
Visual Place Recognition	Mapillary val	Recall@5	96.4	DINOv2 SALAD
Visual Place Recognition	Mapillary test	Recall@1	75	DINOv2 SALAD
Visual Place Recognition	Mapillary test	Recall@10	91.3	DINOv2 SALAD
Visual Place Recognition	Mapillary test	Recall@5	88.8	DINOv2 SALAD

Optimal Transport Aggregation for Visual Place Recognition

Abstract

Results

Related Papers

Optimal Transport Aggregation for Visual Place Recognition

Abstract

Results

Related Papers