TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Composed Image Retrieval for Training-Free Domain Conversion

Composed Image Retrieval for Training-Free Domain Conversion

Nikos Efthymiadis, Bill Psomas, Zakaria Laskar, Konstantinos Karantzalos, Yannis Avrithis, Ondřej Chum, Giorgos Tolias

2024-12-04RetrievalZero-Shot Composed Image Retrieval (ZS-CIR)Language ModellingImage Retrieval
PaperPDFCode(official)

Abstract

This work addresses composed image retrieval in the context of domain conversion, where the content of a query image is retrieved in the domain specified by the query text. We show that a strong vision-language model provides sufficient descriptive power without additional training. The query image is mapped to the text input space using textual inversion. Unlike common practice that invert in the continuous space of text tokens, we use the discrete word space via a nearest-neighbor search in a text vocabulary. With this inversion, the image is softly mapped across the vocabulary and is made more robust using retrieval-based augmentation. Database images are retrieved by a weighted ensemble of text queries combining mapped words with the domain text. Our method outperforms prior art by a large margin on standard and newly introduced benchmarks. Code: https://github.com/NikosEfth/freedom

Results

TaskDatasetMetricValueModel
Image RetrievalLarge Time Lags Location (LTLL)mAP33.24FreeDom (CLIP-L/14)
Image RetrievalLarge Time Lags Location (LTLL)mAP26.6WeiCom (CLIP-L/14)
Image RetrievalLarge Time Lags Location (LTLL)mAP25.46SEARLE (CLIP-L/14)
Image RetrievalLarge Time Lags Location (LTLL)mAP24.21MagicLens (CLIP-L/14)
Image RetrievalLarge Time Lags Location (LTLL)mAP21.61CompoDiff (CLIP-L/14)
Image RetrievalLarge Time Lags Location (LTLL)mAP21.27Pic2Word (CLIP-L/14)
Image RetrievalMiniDomainNetmAP37.27FreeDom (CLIP-L/14)
Image RetrievalMiniDomainNetmAP22.95CompoDiff (CLIP-L/14)
Image RetrievalMiniDomainNetmAP21.78SEARLE (CLIP-L/14)
Image RetrievalMiniDomainNetmAP20.06MagicLens (CLIP-L/14)
Image RetrievalMiniDomainNetmAP12Pic2Word (CLIP-L/14)
Image RetrievalMiniDomainNetmAP8.52WeiCom (CLIP-L/14)
Image RetrievalNICO++mAP26.1FreeDom (CLIP-L/14)
Image RetrievalNICO++mAP19.66MagicLens (CLIP-L/14)
Image RetrievalNICO++mAP15.13SEARLE (CLIP-L/14)
Image RetrievalNICO++mAP10.54WeiCom (CLIP-L/14)
Image RetrievalNICO++mAP10.32CompoDiff (CLIP-L/14)
Image RetrievalNICO++mAP9.76Pic2Word (CLIP-L/14)
Image RetrievalImageNet-RmAP29.91FreeDom (CLIP-L/14)
Image RetrievalImageNet-RmAP14.04SEARLE (CLIP-L/14)
Image RetrievalImageNet-RmAP12.88CompoDiff (CLIP-L/14)
Image RetrievalImageNet-RmAP10.47WeiCom (CLIP-L/14)
Image RetrievalImageNet-RmAP9.13MagicLens (CLIP-L/14)
Image RetrievalImageNet-RmAP7.88Pic2Word (CLIP-L/14)
Composed Image Retrieval (CoIR)Large Time Lags Location (LTLL)mAP33.24FreeDom (CLIP-L/14)
Composed Image Retrieval (CoIR)Large Time Lags Location (LTLL)mAP26.6WeiCom (CLIP-L/14)
Composed Image Retrieval (CoIR)Large Time Lags Location (LTLL)mAP25.46SEARLE (CLIP-L/14)
Composed Image Retrieval (CoIR)Large Time Lags Location (LTLL)mAP24.21MagicLens (CLIP-L/14)
Composed Image Retrieval (CoIR)Large Time Lags Location (LTLL)mAP21.61CompoDiff (CLIP-L/14)
Composed Image Retrieval (CoIR)Large Time Lags Location (LTLL)mAP21.27Pic2Word (CLIP-L/14)
Composed Image Retrieval (CoIR)MiniDomainNetmAP37.27FreeDom (CLIP-L/14)
Composed Image Retrieval (CoIR)MiniDomainNetmAP22.95CompoDiff (CLIP-L/14)
Composed Image Retrieval (CoIR)MiniDomainNetmAP21.78SEARLE (CLIP-L/14)
Composed Image Retrieval (CoIR)MiniDomainNetmAP20.06MagicLens (CLIP-L/14)
Composed Image Retrieval (CoIR)MiniDomainNetmAP12Pic2Word (CLIP-L/14)
Composed Image Retrieval (CoIR)MiniDomainNetmAP8.52WeiCom (CLIP-L/14)
Composed Image Retrieval (CoIR)NICO++mAP26.1FreeDom (CLIP-L/14)
Composed Image Retrieval (CoIR)NICO++mAP19.66MagicLens (CLIP-L/14)
Composed Image Retrieval (CoIR)NICO++mAP15.13SEARLE (CLIP-L/14)
Composed Image Retrieval (CoIR)NICO++mAP10.54WeiCom (CLIP-L/14)
Composed Image Retrieval (CoIR)NICO++mAP10.32CompoDiff (CLIP-L/14)
Composed Image Retrieval (CoIR)NICO++mAP9.76Pic2Word (CLIP-L/14)
Composed Image Retrieval (CoIR)ImageNet-RmAP29.91FreeDom (CLIP-L/14)
Composed Image Retrieval (CoIR)ImageNet-RmAP14.04SEARLE (CLIP-L/14)
Composed Image Retrieval (CoIR)ImageNet-RmAP12.88CompoDiff (CLIP-L/14)
Composed Image Retrieval (CoIR)ImageNet-RmAP10.47WeiCom (CLIP-L/14)
Composed Image Retrieval (CoIR)ImageNet-RmAP9.13MagicLens (CLIP-L/14)
Composed Image Retrieval (CoIR)ImageNet-RmAP7.88Pic2Word (CLIP-L/14)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17