Cross-Batch Memory for Embedding Learning

Xun Wang, Haozhi Zhang, Weilin Huang, Matthew R. Scott

2019-12-14CVPR 2020 6Metric Learning Retrieval Image Retrieval

Paper PDF Code Code Code Code(official)Code

Abstract

Mining informative negative instances are of central importance to deep metric learning (DML), however this task is intrinsically limited by mini-batch training, where only a mini-batch of instances is accessible at each iteration. In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating throughout the training process. This suggests that the features of instances computed at preceding iterations can be used to considerably approximate their features extracted by the current model. We propose a cross-batch memory (XBM) mechanism that memorizes the embeddings of past iterations, allowing the model to collect sufficient hard negative pairs across multiple mini-batches - even over the whole dataset. Our XBM can be directly integrated into a general pair-based DML framework, where the XBM augmented DML can boost performance considerably. In particular, without bells and whistles, a simple contrastive loss with our XBM can have large R@1 improvements of 12%-22.5% on three large-scale image retrieval datasets, surpassing the most sophisticated state-of-the-art methods, by a large margin. Our XBM is conceptually simple, easy to implement - using several lines of codes, and is memory efficient - with a negligible 0.2 GB extra GPU memory. Code is available at: https://github.com/MalongTech/research-xbm.

Results

Task	Dataset	Metric	Value	Model
Image Retrieval	SOP	R@1	80.6	Cross-Batch Memory
Image Retrieval	In-Shop	R@1	91.3	Cross-Batch Memory

Related Papers

Unsupervised Ground Metric Learning2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17 FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval2025-07-17 Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16 Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16