TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Data Roaming and Quality Assessment for Composed Image Ret...

Data Roaming and Quality Assessment for Composed Image Retrieval

Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski

2023-03-16Composed Image Retrieval (CoIR)RetrievalImage Retrieval
PaperPDFCode(official)

Abstract

The task of Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. However, current CoIR datasets are orders of magnitude smaller compared to other vision and language (V&L) datasets. Additionally, some of these datasets have noticeable issues, such as queries containing redundant modalities. To address these shortcomings, we introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones. Pre-training on our LaSCo, shows a noteworthy improvement in performance, even in zero-shot. Furthermore, we propose a new approach for analyzing CoIR datasets and methods, which detects modality redundancy or necessity, in queries. We also introduce a new CoIR baseline, the Cross-Attention driven Shift Encoder (CASE). This baseline allows for early fusion of modalities using a cross-attention module and employs an additional auxiliary task during training. Our experiments demonstrate that this new baseline outperforms the current state-of-the-art methods on established benchmarks like FashionIQ and CIRR.

Results

TaskDatasetMetricValueModel
Image RetrievalLaSCoRecall@1 (%)7.08CASE
Image RetrievalLaSCoRecall@1 (%)4.26BLIP4CIR
Image RetrievalFashion IQ(Recall@10+Recall@50)/259.73CASE
Image RetrievalFashion IQRecall@1048.79CASE
Image RetrievalCIRR(Recall@5+Recall_subset@1)/278.25CASE (Pre-trained on LaSCo.Ca)
Image RetrievalCIRRRecall@1088.75CASE (Pre-trained on LaSCo.Ca)
Image RetrievalCIRR(Recall@5+Recall_subset@1)/277.5CASE
Image RetrievalCIRRRecall@1087.25CASE

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16