TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Optimal Transport Aggregation for Visual Place Recognition

Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo, Javier Civera

2023-11-27CVPR 2024 1Visual Place RecognitionRe-Ranking
PaperPDFCode(official)

Abstract

The task of Visual Place Recognition (VPR) aims to match a query image against references from an extensive database of images from different places, relying solely on visual cues. State-of-the-art pipelines focus on the aggregation of features extracted from a deep backbone, in order to form a global descriptor for each image. In this context, we introduce SALAD (Sinkhorn Algorithm for Locally Aggregated Descriptors), which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem. In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative, enhancing the overall descriptor quality. Additionally, we leverage and fine-tune DINOv2 as a backbone, which provides enhanced description power for the local features, and dramatically reduces the required training time. As a result, our single-stage method not only surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost. Code and models are available at https://github.com/serizba/salad.

Results

TaskDatasetMetricValueModel
Visual Place RecognitionNordlandRecall@185.2DINOv2 SALAD (1-frame thr.)
Visual Place RecognitionNordlandRecall@598.5DINOv2 SALAD (1-frame thr.)
Visual Place RecognitionPittsburgh-250k-testRecall@195.1DINOv2 SALAD
Visual Place RecognitionPittsburgh-250k-testRecall@1099.1DINOv2 SALAD
Visual Place RecognitionPittsburgh-250k-testRecall@598.5DINOv2 SALAD
Visual Place RecognitionSPEDRecall@192.1DINOv2 SALAD
Visual Place RecognitionSPEDRecall@1096.5DINOv2 SALAD
Visual Place RecognitionSPEDRecall@596.2DINOv2 SALAD
Visual Place RecognitionMapillary valRecall@192.2DINOv2 SALAD
Visual Place RecognitionMapillary valRecall@1097DINOv2 SALAD
Visual Place RecognitionMapillary valRecall@596.4DINOv2 SALAD
Visual Place RecognitionMapillary testRecall@175DINOv2 SALAD
Visual Place RecognitionMapillary testRecall@1091.3DINOv2 SALAD
Visual Place RecognitionMapillary testRecall@588.8DINOv2 SALAD

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17CATVis: Context-Aware Thought Visualization2025-07-15Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition2025-07-04SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification2025-06-26RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation2025-06-25IRanker: Towards Ranking Foundation Model2025-06-25