TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Compositional Learning of Image-Text Query for Image Retri...

Compositional Learning of Image-Text Query for Image Retrieval

Muhammad Umer Anwaar, Egor Labintcev, Martin Kleinsteuber

2020-06-19Metric LearningImage Retrieval with Multi-Modal QueryRetrievalImage Retrieval
PaperPDFCode(official)

Abstract

In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications. For instance, a user of an E-Commerce platform is interested in buying a dress, which should look similar to her friend's dress, but the dress should be of white color with a ribbon sash. In this case, we would like the algorithm to retrieve some dresses with desired modifications in the query dress. We propose an autoencoder based model, ComposeAE, to learn the composition of image and text query for retrieving images. We adopt a deep metric learning approach and learn a metric that pushes composition of source image and text query closer to the target images. We also propose a rotational symmetry constraint on the optimization problem. Our approach is able to outperform the state-of-the-art method TIRG \cite{TIRG} on three benchmark datasets, namely: MIT-States, Fashion200k and Fashion IQ. In order to ensure fair comparison, we introduce strong baselines by enhancing TIRG method. To ensure reproducibility of the results, we publish our code here: \url{https://github.com/ecom-research/ComposeAE}.

Results

TaskDatasetMetricValueModel
Image RetrievalFashion IQ(Recall@10+Recall@50)/220.6ComposeAE
Image Retrieval with Multi-Modal QueryMIT-StatesRecall@113.9ComposeAE
Image Retrieval with Multi-Modal QueryMIT-StatesRecall@1047.9ComposeAE
Image Retrieval with Multi-Modal QueryMIT-StatesRecall@535.5ComposeAE
Image Retrieval with Multi-Modal QueryFashion200kRecall@122.8ComposeAE
Image Retrieval with Multi-Modal QueryFashion200kRecall@1055.3ComposeAE
Image Retrieval with Multi-Modal QueryFashion200kRecall@5073.4ComposeAE
Image Retrieval with Multi-Modal QueryFashionIQRecall@1011.8ComposeAE

Related Papers

Unsupervised Ground Metric Learning2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval2025-07-17Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16