TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CLIP-ReID: Exploiting Vision-Language Model for Image Re-I...

CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels

Siyuan Li, Li Sun, Qingli Li

2022-11-25Vehicle Re-IdentificationImage ClassificationPerson Re-IdentificationLanguage Modelling
PaperPDFCodeCode(official)

Abstract

Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lacking concrete text descriptions. Therefore, it remains to be determined how such models could be applied to these tasks. This paper first finds out that simply fine-tuning the visual model initialized by the image encoder in CLIP, has already obtained competitive performances in various ReID tasks. Then we propose a two-stage strategy to facilitate a better visual representation. The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID and give them to the text encoder to form ambiguous descriptions. In the first training stage, image and text encoders from CLIP keep fixed, and only the text tokens are optimized from scratch by the contrastive loss computed within a batch. In the second stage, the ID-specific text tokens and their encoder become static, providing constraints for fine-tuning the image encoder. With the help of the designed loss in the downstream task, the image encoder is able to represent data as vectors in the feature embedding accurately. The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks. Code is available at https://github.com/Syliz517/CLIP-ReID.

Results

TaskDatasetMetricValueModel
Person Re-IdentificationMSMT17Rank-191.2CLIP-ReID (with re-ranking)
Person Re-IdentificationMSMT17mAP86.7CLIP-ReID (with re-ranking)
Person Re-IdentificationMSMT17Rank-189.7CLIP-ReID (without re-ranking)
Person Re-IdentificationMSMT17mAP75.8CLIP-ReID (without re-ranking)
Person Re-IdentificationMarket-1501Rank-195.4CLIP-ReID (without re-ranking)
Person Re-IdentificationMarket-1501mAP90.5CLIP-ReID (without re-ranking)
Person Re-IdentificationDukeMTMC-reIDRank-190.8CLIP-ReID (without re-ranking)
Person Re-IdentificationDukeMTMC-reIDmAP83.1CLIP-ReID (without re-ranking)
Intelligent SurveillanceVeRi-776Rank-197.3CLIP-ReID (without re-ranking)
Intelligent SurveillanceVeRi-776mAP84.5CLIP-ReID (without re-ranking)
Intelligent SurveillanceVehicleID SmallRank-185.5CLIP-ReID (without re-ranking)
Intelligent SurveillanceVehicleID SmallRank-597.2CLIP-ReID (without re-ranking)
Vehicle Re-IdentificationVeRi-776Rank-197.3CLIP-ReID (without re-ranking)
Vehicle Re-IdentificationVeRi-776mAP84.5CLIP-ReID (without re-ranking)
Vehicle Re-IdentificationVehicleID SmallRank-185.5CLIP-ReID (without re-ranking)
Vehicle Re-IdentificationVehicleID SmallRank-597.2CLIP-ReID (without re-ranking)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning2025-07-17WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding2025-07-17