MARS: Paying more attention to visual attributes for text-based person search

Alex Ergasti, Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati

2024-07-05Attribute Person Search Person Retrieval Person Re-Identification Text based Person Search Text based Person Retrieval

Paper PDF Code(official)

Abstract

Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature of the task requires learning representations that bridge text and image data within a shared latent space. Existing TBPS systems face two major challenges. One is defined as inter-identity noise that is due to the inherent vagueness and imprecision of text descriptions and it indicates how descriptions of visual attributes can be generally associated to different people; the other is the intra-identity variations, which are all those nuisances e.g. pose, illumination, that can alter the visual appearance of the same textual attributes for a given subject. To address these issues, this paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive), which enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss. The former employs a Masked AutoEncoder trained to reconstruct randomly masked image patches with the aid of the textual description. In doing so the model is encouraged to learn more expressive representations and textual-visual relations in the latent space. The Attribute Loss, instead, balances the contribution of different types of attributes, defined as adjective-noun chunks of text. This loss ensures that every attribute is taken into consideration in the person retrieval process. Extensive experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements, with significant gains in the mean Average Precision (mAP) metric w.r.t. the current state of the art.

Results

Task	Dataset	Metric	Value	Model
Text based Person Retrieval	CUHK-PEDES	R@1	77.62	MARS
Text based Person Retrieval	CUHK-PEDES	R@10	94.27	MARS
Text based Person Retrieval	CUHK-PEDES	R@5	90.63	MARS
Text based Person Retrieval	CUHK-PEDES	mAP	71.41	MARS
Text based Person Retrieval	ICFG-PEDES	R@1	67.6	MARS
Text based Person Retrieval	ICFG-PEDES	R@10	85.79	MARS
Text based Person Retrieval	ICFG-PEDES	R@5	81.47	MARS
Text based Person Retrieval	ICFG-PEDES	mAP	44.93	MARS
Text based Person Retrieval	RSTPReid	R@1	67.55	MARS
Text based Person Retrieval	RSTPReid	R@10	91.35	MARS
Text based Person Retrieval	RSTPReid	R@5	86.65	MARS
Text based Person Retrieval	RSTPReid	mAP	52.92	MARS

MARS: Paying more attention to visual attributes for text-based person search

Abstract

Results

Related Papers

MARS: Paying more attention to visual attributes for text-based person search

Abstract

Results

Related Papers