Hard negative examples are hard, but useful

Hong Xuan, Abby Stylianou, Xiaotong Liu, Robert Pless

2020-07-24ECCV 2020 8Metric Learning Semantic Similarity Semantic Textual Similarity Retrieval Image Retrieval

Abstract

Triplet loss is an extremely common approach to distance metric learning. Representations of images from the same class are optimized to be mapped closer together in an embedding space than representations of images from different classes. Much work on triplet losses focuses on selecting the most useful triplets of images to consider, with strategies that select dissimilar examples from the same class or similar examples from different classes. The consensus of previous research is that optimizing with the \textit{hardest} negative examples leads to bad training behavior. That's a problem -- these hardest negatives are literally the cases where the distance metric fails to capture semantic similarity. In this paper, we characterize the space of triplets and derive why hard negatives make triplet loss training fail. We offer a simple fix to the loss function and show that, with this fix, optimizing with hard negative examples becomes feasible. This leads to more generalizable features, and image retrieval results that outperform state of the art for datasets with high intra-class variance.

Results

Task	Dataset	Metric	Value	Model
Metric Learning	CARS196	R@1	73.2	SCT(64)
Metric Learning	CUB-200-2011	R@1	57.7	SCT(64)
Metric Learning	In-Shop	R@1	90	SCT(512)
Metric Learning	Stanford Online Products	R@1	81.6	SCT(512)

Related Papers

Unsupervised Ground Metric Learning2025-07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17 FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval2025-07-17 Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16