Zefeng Ding, Changxing Ding, Zhiyin Shao, DaCheng Tao
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level features from the two modalities. Second, we design a multi-view non-local network that captures the relationships between body parts, thereby establishing better correspondences between body parts and noun phrases. Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features. Finally, to expedite future research in text-to-image ReID, we build a new database named ICFG-PEDES. Extensive experiments demonstrate that SSAN outperforms state-of-the-art approaches by significant margins. Both the new ICFG-PEDES database and the SSAN code are available at https://github.com/zifyloo/SSAN.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Retrieval | ICFG-PEDES | rank-1 | 54.23 | SSAN |
| Text based Person Retrieval | CUHK-PEDES | R@1 | 61.37 | SSAN |
| Text based Person Retrieval | CUHK-PEDES | R@10 | 86.73 | SSAN |
| Text based Person Retrieval | CUHK-PEDES | R@5 | 80.15 | SSAN |
| Text based Person Retrieval | ICFG-PEDES | R@1 | 54.23 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | ICFG-PEDES | Rank 1 | 40.57 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | ICFG-PEDES | Rank-10 | 71.53 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | ICFG-PEDES | Rank-5 | 62.58 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | ICFG-PEDES | mAP | 20.93 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | ICFG-PEDES | mINP | 2.22 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | RSTPReid | Rank 1 | 35.1 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | RSTPReid | Rank 10 | 71.45 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | RSTPReid | Rank 5 | 60 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | RSTPReid | mAP | 28.9 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | RSTPReid | mINP | 12.08 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | CUHK-PEDES | Rank 10 | 77.42 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | CUHK-PEDES | Rank-1 | 46.52 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | CUHK-PEDES | Rank-5 | 68.36 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | CUHK-PEDES | mAP | 42.49 | SSAN |
| Text-based Person Retrieval with Noisy Correspondence | CUHK-PEDES | mINP | 28.13 | SSAN |