TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Composed Image Retrieval via Contrastive Learnin...

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Zhangchi Feng, Richong Zhang, Zhijie Nie

2024-04-17Large Language ModelContrastive LearningRetrievalZero-Shot Composed Image Retrieval (ZS-CIR)Language ModellingImage Retrieval
PaperPDFCode(official)

Abstract

The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation space rapidly. The above two improvements can be effectively stacked and designed to be plug-and-play, easily applied to existing CIR models without changing their original architectures. Extensive experiments and ablation analysis demonstrate that our method effectively scales positives and negatives and achieves state-of-the-art results on both FashionIQ and CIRR datasets. In addition, our method also performs well in zero-shot composed image retrieval, providing a new CIR solution for the low-resources scenario. Our code and data are released at https://github.com/BUAADreamer/SPN4CIR.

Results

TaskDatasetMetricValueModel
Image RetrievalFashion IQ(Recall@10+Recall@50)/266.41SPN4CIR (SPRC)
Image RetrievalFashion IQRecall@1056.37SPN4CIR (SPRC)
Image RetrievalCIRR(Recall@5+Recall_subset@1)/282.69SPN4CIR (SPRC)
Image RetrievalCIRRRecall@1090.87SPN4CIR (SPRC)
Image RetrievalCIRRR@565.42SPN4CIR (SPN-CC)
Image RetrievalCIRRRsubset@164.87SPN4CIR (SPN-CC)
Composed Image Retrieval (CoIR)CIRRR@565.42SPN4CIR (SPN-CC)
Composed Image Retrieval (CoIR)CIRRRsubset@164.87SPN4CIR (SPN-CC)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17