TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MARS: Paying more attention to visual attributes for text-...

MARS: Paying more attention to visual attributes for text-based person search

Alex Ergasti, Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati

2024-07-05AttributePerson SearchPerson RetrievalPerson Re-IdentificationText based Person SearchText based Person Retrieval
PaperPDFCode(official)

Abstract

Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature of the task requires learning representations that bridge text and image data within a shared latent space. Existing TBPS systems face two major challenges. One is defined as inter-identity noise that is due to the inherent vagueness and imprecision of text descriptions and it indicates how descriptions of visual attributes can be generally associated to different people; the other is the intra-identity variations, which are all those nuisances e.g. pose, illumination, that can alter the visual appearance of the same textual attributes for a given subject. To address these issues, this paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive), which enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss. The former employs a Masked AutoEncoder trained to reconstruct randomly masked image patches with the aid of the textual description. In doing so the model is encouraged to learn more expressive representations and textual-visual relations in the latent space. The Attribute Loss, instead, balances the contribution of different types of attributes, defined as adjective-noun chunks of text. This loss ensures that every attribute is taken into consideration in the person retrieval process. Extensive experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements, with significant gains in the mean Average Precision (mAP) metric w.r.t. the current state of the art.

Results

TaskDatasetMetricValueModel
Text based Person RetrievalCUHK-PEDESR@177.62MARS
Text based Person RetrievalCUHK-PEDESR@1094.27MARS
Text based Person RetrievalCUHK-PEDESR@590.63MARS
Text based Person RetrievalCUHK-PEDESmAP71.41MARS
Text based Person RetrievalICFG-PEDESR@167.6MARS
Text based Person RetrievalICFG-PEDESR@1085.79MARS
Text based Person RetrievalICFG-PEDESR@581.47MARS
Text based Person RetrievalICFG-PEDESmAP44.93MARS
Text based Person RetrievalRSTPReidR@167.55MARS
Text based Person RetrievalRSTPReidR@1091.35MARS
Text based Person RetrievalRSTPReidR@586.65MARS
Text based Person RetrievalRSTPReidmAP52.92MARS

Related Papers

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning2025-07-17WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16Attributes Shape the Embedding Space of Face Recognition Models2025-07-15COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15Try Harder: Hard Sample Generation and Learning for Clothes-Changing Person Re-ID2025-07-15Mind the Gap: Bridging Occlusion in Gait Recognition via Residual Gap Correction2025-07-15