TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Zero-shot Composed Text-Image Retrieval

Zero-shot Composed Text-Image Retrieval

Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

2023-06-12RetrievalZero-Shot Composed Image Retrieval (ZS-CIR)Image Retrieval
PaperPDFCode(official)

Abstract

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability. We make the following contributions: (i) we initiate a scalable pipeline to automatically construct datasets for training CIR model, by simply exploiting a large-scale dataset of image-text pairs, e.g., a subset of LAION-5B; (ii) we introduce a transformer-based adaptive aggregation model, TransAgg, which employs a simple yet efficient fusion mechanism, to adaptively combine information from diverse modalities; (iii) we conduct extensive ablation studies to investigate the usefulness of our proposed data construction procedure, and the effectiveness of core components in TransAgg; (iv) when evaluating on the publicly available benckmarks under the zero-shot scenario, i.e., training on the automatically constructed datasets, then directly conduct inference on target downstream datasets, e.g., CIRR and FashionIQ, our proposed approach either performs on par with or significantly outperforms the existing state-of-the-art (SOTA) models. Project page: https://code-kunkun.github.io/ZS-CIR/

Results

TaskDatasetMetricValueModel
Image RetrievalFashion IQ(Recall@10+Recall@50)/244.75TransAgg (Laion-CIR-Combined)
Image RetrievalCIRRR@137.87TransAgg (Laion-CIR-Combined)
Image RetrievalCIRRR@568.88TransAgg (Laion-CIR-Combined)
Image RetrievalCIRRR@5093.86TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)Fashion IQ(Recall@10+Recall@50)/244.75TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)CIRRR@137.87TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)CIRRR@568.88TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)CIRRR@5093.86TransAgg (Laion-CIR-Combined)

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16