TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Description-based Person Re-identification by Mu...

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments

Kai Niu, Yan Huang, Wanli Ouyang, Liang Wang

2019-06-23Person Re-IdentificationText based Person Retrieval
PaperPDF

Abstract

Description-based person re-identification (Re-id) is an important task in video surveillance that requires discriminative cross-modal representations to distinguish different people. It is difficult to directly measure the similarity between images and descriptions due to the modality heterogeneity (the cross-modal problem). And all samples belonging to a single category (the fine-grained problem) makes this task even harder than the conventional image-description matching task. In this paper, we propose a Multi-granularity Image-text Alignments (MIA) model to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id. Specifically, three different granularities, i.e., global-global, global-local and local-local alignments are carried out hierarchically. Firstly, the global-global alignment in the Global Contrast (GC) module is for matching the global contexts of images and descriptions. Secondly, the global-local alignment employs the potential relations between local components and global contexts to highlight the distinguishable components while eliminating the uninvolved ones adaptively in the Relation-guided Global-local Alignment (RGA) module. Thirdly, as for the local-local alignment, we match visual human parts with noun phrases in the Bi-directional Fine-grained Matching (BFM) module. The whole network combining multiple granularities can be end-to-end trained without complex pre-processing. To address the difficulties in training the combination of multiple granularities, an effective step training strategy is proposed to train these granularities step-by-step. Extensive experiments and analysis have shown that our method obtains the state-of-the-art performance on the CUHK-PEDES dataset and outperforms the previous methods by a significant margin.

Results

TaskDatasetMetricValueModel
Text based Person RetrievalCUHK-PEDESR@153.1MIA
Text based Person RetrievalCUHK-PEDESR@1082.9MIA
Text based Person RetrievalCUHK-PEDESR@575MIA

Related Papers

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning2025-07-17WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding2025-07-17Try Harder: Hard Sample Generation and Learning for Clothes-Changing Person Re-ID2025-07-15Mind the Gap: Bridging Occlusion in Gait Recognition via Residual Gap Correction2025-07-15KeyRe-ID: Keypoint-Guided Person Re-Identification using Part-Aware Representation in Videos2025-07-10CORE-ReID V2: Advancing the Domain Adaptation for Object Re-Identification with Optimized Training and Ensemble Fusion2025-07-04Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence2025-07-02DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding2025-06-16