Zero-shot Composed Text-Image Retrieval

Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

2023-06-12Retrieval Zero-Shot Composed Image Retrieval (ZS-CIR)Image Retrieval

Abstract

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability. We make the following contributions: (i) we initiate a scalable pipeline to automatically construct datasets for training CIR model, by simply exploiting a large-scale dataset of image-text pairs, e.g., a subset of LAION-5B; (ii) we introduce a transformer-based adaptive aggregation model, TransAgg, which employs a simple yet efficient fusion mechanism, to adaptively combine information from diverse modalities; (iii) we conduct extensive ablation studies to investigate the usefulness of our proposed data construction procedure, and the effectiveness of core components in TransAgg; (iv) when evaluating on the publicly available benckmarks under the zero-shot scenario, i.e., training on the automatically constructed datasets, then directly conduct inference on target downstream datasets, e.g., CIRR and FashionIQ, our proposed approach either performs on par with or significantly outperforms the existing state-of-the-art (SOTA) models. Project page: https://code-kunkun.github.io/ZS-CIR/

Results

Task	Dataset	Metric	Value	Model
Image Retrieval	Fashion IQ	(Recall@10+Recall@50)/2	44.75	TransAgg (Laion-CIR-Combined)
Image Retrieval	CIRR	R@1	37.87	TransAgg (Laion-CIR-Combined)
Image Retrieval	CIRR	R@5	68.88	TransAgg (Laion-CIR-Combined)
Image Retrieval	CIRR	R@50	93.86	TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)	Fashion IQ	(Recall@10+Recall@50)/2	44.75	TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)	CIRR	R@1	37.87	TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)	CIRR	R@5	68.88	TransAgg (Laion-CIR-Combined)
Composed Image Retrieval (CoIR)	CIRR	R@50	93.86	TransAgg (Laion-CIR-Combined)

Zero-shot Composed Text-Image Retrieval

Abstract

Results

Related Papers

Zero-shot Composed Text-Image Retrieval

Abstract

Results

Related Papers