TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cross-Modal Adaptive Dual Association for Text-to-Image Pe...

Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval

Dixuan Lin, Yixing Peng, Jingke Meng, Wei-Shi Zheng

2023-12-04Cross-Modal Person Re-IdentificationAttributeImage to textText-based Person RetrievalPerson RetrievalPerson Re-IdentificationRetrievalText based Person Retrieval
PaperPDF

Abstract

Text-to-image person re-identification (ReID) aims to retrieve images of a person based on a given textual description. The key challenge is to learn the relations between detailed information from visual and textual modalities. Existing works focus on learning a latent space to narrow the modality gap and further build local correspondences between two modalities. However, these methods assume that image-to-text and text-to-image associations are modality-agnostic, resulting in suboptimal associations. In this work, we show the discrepancy between image-to-text association and text-to-image association and propose CADA: Cross-Modal Adaptive Dual Association that finely builds bidirectional image-text detailed associations. Our approach features a decoder-based adaptive dual association module that enables full interaction between visual and textual modalities, allowing for bidirectional and adaptive cross-modal correspondence associations. Specifically, the paper proposes a bidirectional association mechanism: Association of text Tokens to image Patches (ATP) and Association of image Regions to text Attributes (ARA). We adaptively model the ATP based on the fact that aggregating cross-modal features based on mistaken associations will lead to feature distortion. For modeling the ARA, since the attributes are typically the first distinguishing cues of a person, we propose to explore the attribute-level association by predicting the masked text phrase using the related image region. Finally, we learn the dual associations between texts and images, and the experimental results demonstrate the superiority of our dual formulation. Codes will be made publicly available.

Results

TaskDatasetMetricValueModel
Text based Person RetrievalCUHK-PEDESRank-178.37CADA
Text based Person RetrievalCUHK-PEDESRank-1094.58CADA
Text based Person RetrievalCUHK-PEDESRank-591.57CADA
Text based Person RetrievalCUHK-PEDESmAP68.87CADA
Text based Person RetrievalICFG-PEDESRank-167.81CADA
Text based Person RetrievalICFG-PEDESRank-1087.14CADA
Text based Person RetrievalICFG-PEDESRank-582.34CADA
Text based Person RetrievalICFG-PEDESmAP39.85CADA
Text based Person RetrievalRSTPReidRank-169.6CADA
Text based Person RetrievalRSTPReidRank-1092.4CADA
Text based Person RetrievalRSTPReidRank-586.75CADA
Text based Person RetrievalRSTPReidmAP52.74CADA

Related Papers

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning2025-07-17WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16