Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval

Delong Liu, Haiwen Li, Zhaohui Hou, Zhicheng Zhao, Fei Su, Yuan Dong

2023-11-25Zero-shot Composed Person Retrieval Text-based Person Retrieval Person Retrieval Retrieval Image Generation Text based Person Retrieval

Paper PDF Code(official)

Abstract

Person retrieval has attracted rising attention. Existing methods are mainly divided into two retrieval modes, namely image-only and text-only. However, they are unable to make full use of the available information and are difficult to meet diverse application requirements. To address the above limitations, we propose a new Composed Person Retrieval (CPR) task, which combines visual and textual queries to identify individuals of interest from large-scale person image databases. Nevertheless, the foremost difficulty of the CPR task is the lack of available annotated datasets. Therefore, we first introduce a scalable automatic data synthesis pipeline, which decomposes complex multimodal data generation into the creation of textual quadruples followed by identity-consistent image synthesis using fine-tuned generative models. Meanwhile, a multimodal filtering method is designed to ensure the resulting SynCPR dataset retains 1.15 million high-quality and fully synthetic triplets. Additionally, to improve the representation of composed person queries, we propose a novel Fine-grained Adaptive Feature Alignment (FAFA) framework through fine-grained dynamic alignment and masked feature reasoning. Moreover, for objective evaluation, we manually annotate the Image-Text Composed Person Retrieval (ITCPR) test set. The extensive experiments demonstrate the effectiveness of the SynCPR dataset and the superiority of the proposed FAFA framework when compared with the state-of-the-art methods. All code and data will be provided at https://github.com/Delong-liu-bupt/Composed_Person_Retrieval.

Results

Task	Dataset	Metric	Value	Model
Image Retrieval with Multi-Modal Query	ITCPR dataset	Rank-1	46.54	FAFA
Image Retrieval with Multi-Modal Query	ITCPR dataset	mAP	55.6	FAFA
Image Retrieval with Multi-Modal Query	ITCPR dataset	Rank-1	45.55	Word4Per（FAFA old version）
Image Retrieval with Multi-Modal Query	ITCPR dataset	mAP	55.26	Word4Per（FAFA old version）
Cross-Modal Information Retrieval	ITCPR dataset	Rank-1	46.54	FAFA
Cross-Modal Information Retrieval	ITCPR dataset	mAP	55.6	FAFA
Cross-Modal Information Retrieval	ITCPR dataset	Rank-1	45.55	Word4Per（FAFA old version）
Cross-Modal Information Retrieval	ITCPR dataset	mAP	55.26	Word4Per（FAFA old version）
Cross-Modal Retrieval	ITCPR dataset	Rank-1	46.54	FAFA
Cross-Modal Retrieval	ITCPR dataset	mAP	55.6	FAFA
Cross-Modal Retrieval	ITCPR dataset	Rank-1	45.55	Word4Per（FAFA old version）
Cross-Modal Retrieval	ITCPR dataset	mAP	55.26	Word4Per（FAFA old version）

Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval

Abstract

Results

Related Papers

Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval

Abstract

Results

Related Papers