TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ViTAA: Visual-Textual Attributes Alignment in Person Searc...

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang

2020-05-15ECCV 2020 8AttributePerson SearchContrastive LearningText based Person Retrieval
PaperPDFCode(official)Code

Abstract

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as the performance boosting by a robust feature learning that the referred identity can be accurately bundled by multiple attribute visual cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into subspaces corresponding to attributes using a light auxiliary attribute segmentation computing branch. It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. Upon that, we validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Code will be publicly available upon publication.

Results

TaskDatasetMetricValueModel
Text based Person RetrievalCUHK-PEDESR@155.97ViTAA
Text based Person RetrievalCUHK-PEDESR@1083.52ViTAA
Text based Person RetrievalCUHK-PEDESR@575.84ViTAA

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Attributes Shape the Embedding Space of Face Recognition Models2025-07-15