BoQ: A Place is Worth a Bag of Learnable Queries

Amar Ali-bey, Brahim Chaib-Draa, Philippe Giguère

2024-05-12CVPR 2024 1Image Similarity Search Visual Place Recognition Retrieval

Abstract

In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https://github.com/amaralibey/Bag-of-Queries.

Results

Task	Dataset	Metric	Value	Model
Visual Place Recognition	SVOX-Snow	Recall@1	98.7	BoQ (ResNet-50)
Visual Place Recognition	AmsterTime	Recall@1	63	BoQ
Visual Place Recognition	AmsterTime	Recall@10	85.1	BoQ
Visual Place Recognition	AmsterTime	Recall@5	81.6	BoQ
Visual Place Recognition	AmsterTime	Recall@1	52.2	BoQ (ResNet-50)
Visual Place Recognition	Nordland	Recall@1	90.6	BoQ
Visual Place Recognition	Nordland	Recall@10	97.5	BoQ
Visual Place Recognition	Nordland	Recall@5	96	BoQ
Visual Place Recognition	Nordland	Recall@1	83.1	BoQ (ResNet-50)
Visual Place Recognition	San Francisco Landmark Dataset	Recall@1	93.6	BoQ
Visual Place Recognition	San Francisco Landmark Dataset	Recall@10	96.5	BoQ
Visual Place Recognition	San Francisco Landmark Dataset	Recall@5	95.8	BoQ
Visual Place Recognition	SVOX-Night	Recall@1	87.1	BoQ (ResNet-50)
Visual Place Recognition	St Lucia	Recall@1	100	BoQ (DINOv2)
Visual Place Recognition	St Lucia	Recall@5	100	BoQ (DINOv2)
Visual Place Recognition	St Lucia	Recall@10	100	BoQ
Visual Place Recognition	St Lucia	Recall@5	100	BoQ
Visual Place Recognition	Pittsburgh-250k-test	Recall@1	96.6	BoQ
Visual Place Recognition	Pittsburgh-250k-test	Recall@10	99.5	BoQ
Visual Place Recognition	Pittsburgh-250k-test	Recall@5	99.1	BoQ
Visual Place Recognition	Pittsburgh-250k-test	Recall@1	95	BoQ (ResNet-50)
Visual Place Recognition	Pittsburgh-250k-test	Recall@10	99.1	BoQ (ResNet-50)
Visual Place Recognition	Pittsburgh-250k-test	Recall@5	98.5	BoQ (ResNet-50)
Visual Place Recognition	SPED	Recall@1	92.5	BoQ
Visual Place Recognition	SPED	Recall@10	96.7	BoQ
Visual Place Recognition	SPED	Recall@5	95.9	BoQ
Visual Place Recognition	SPED	Recall@1	86.5	BoQ (ResNet-50)
Visual Place Recognition	SPED	Recall@10	95.7	BoQ (ResNet-50)
Visual Place Recognition	SPED	Recall@5	93.4	BoQ (ResNet-50)
Visual Place Recognition	Pittsburgh-30k-test	Recall@1	93.7	BoQ
Visual Place Recognition	Pittsburgh-30k-test	Recall@10	97.9	BoQ
Visual Place Recognition	Pittsburgh-30k-test	Recall@5	97.1	BoQ
Visual Place Recognition	Pittsburgh-30k-test	Recall@1	92.4	BoQ (ResNet-50)
Visual Place Recognition	Tokyo247	Recall@1	98.1	BoQ
Visual Place Recognition	Tokyo247	Recall@10	98.7	BoQ
Visual Place Recognition	Tokyo247	Recall@5	98.1	BoQ
Visual Place Recognition	Mapillary val	Recall@1	93.8	BoQ
Visual Place Recognition	Mapillary val	Recall@10	97	BoQ
Visual Place Recognition	Mapillary val	Recall@5	96.8	BoQ
Visual Place Recognition	Mapillary val	Recall@1	91.2	BoQ (ResNet-50)
Visual Place Recognition	Mapillary val	Recall@10	96.1	BoQ (ResNet-50)
Visual Place Recognition	Mapillary val	Recall@5	95.3	BoQ (ResNet-50)
Visual Place Recognition	SVOX-Rain	Recall@1	96.2	BoQ (ResNet-50)
Visual Place Recognition	Mapillary test	Recall@1	79	BoQ
Visual Place Recognition	Mapillary test	Recall@10	92	BoQ
Visual Place Recognition	Mapillary test	Recall@5	90.3	BoQ
Visual Place Recognition	Eynsham	Recall@1	92.2	BoQ
Visual Place Recognition	Eynsham	Recall@10	96.4	BoQ
Visual Place Recognition	Eynsham	Recall@5	95.6	BoQ
Visual Place Recognition	Eynsham	Recall@1	91.3	BoQ (ResNet-50)
Visual Place Recognition	SVOX-Overcast	Recall@1	97.8	BoQ (ResNet-50)
Visual Place Recognition	SVOX-Sun	Recall@1	95.9	BoQ (ResNet-50)
Visual Place Recognition	Nordland* (2760 queries)	Recall@1	81.3	BoQ
Visual Place Recognition	Nordland* (2760 queries)	Recall@10	94.8	BoQ
Visual Place Recognition	Nordland* (2760 queries)	Recall@5	92.5	BoQ

BoQ: A Place is Worth a Bag of Learnable Queries

Abstract

Results

Related Papers

BoQ: A Place is Worth a Bag of Learnable Queries

Abstract

Results

Related Papers