TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Supervised Pre-Training for Transformer-Based Person ...

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin

2021-11-23Self-Supervised LearningPerson Re-IdentificationUnsupervised Person Re-IdentificationUnsupervised Domain AdaptationDomain Adaptation
PaperPDFCode(official)CodeCode

Abstract

Transformer-based supervised pre-training achieves great performance in person re-identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset (e.g. ImageNet-21K) to boost the performance because of the strong data fitting ability of the transformer. To address this challenge, this work targets to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure, respectively. We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks. To further reduce the domain gap and accelerate the pre-training, the Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap between pre-training and fine-tuning data. Based on CFS, a subset is selected via sampling relevant data close to the down-stream ReID data and filtering irrelevant data from the pre-training dataset. For the model structure, a ReID-specific module named IBN-based convolution stem (ICS) is proposed to bridge the domain gap by learning more invariant features. Extensive experiments have been conducted to fine-tune the pre-training models under supervised learning, unsupervised domain adaptation (UDA), and unsupervised learning (USL) settings. We successfully downscale the LUPerson dataset to 50% with no performance degradation. Finally, we achieve state-of-the-art performance on Market-1501 and MSMT17. For example, our ViT-S/16 achieves 91.3%/89.9%/89.6% mAP accuracy on Market1501 for supervised/UDA/USL ReID. Codes and models will be released to https://github.com/michuanhaohao/TransReID-SSL.

Results

TaskDatasetMetricValueModel
Person Re-IdentificationMSMT17Rank-189.5TransReID-SSL (ViT-B without RK)
Person Re-IdentificationMSMT17mAP75TransReID-SSL (ViT-B without RK)
Person Re-IdentificationMSMT17Rank-189.6TransReID-SSL (without RK)
Person Re-IdentificationMarket-1501Rank-196.7TransReID-SSL (ViT-B w/o RK)
Person Re-IdentificationMarket-1501mAP93.2TransReID-SSL (ViT-B w/o RK)
Person Re-IdentificationMSMT17Rank-175TransReID-SSL (ViTi-S)
Person Re-IdentificationMSMT17mAP50.6TransReID-SSL (ViTi-S)
Person Re-IdentificationMSMT17Rank-166.4TransReID-SSL (ViT-S)
Person Re-IdentificationMSMT17mAP40.9TransReID-SSL (ViT-S)
Person Re-IdentificationMarket-1501MAP89.6TransReID-SSL (ViTi-S)
Person Re-IdentificationMarket-1501Rank-195.3TransReID-SSL (ViTi-S)
Person Re-IdentificationMarket-1501MAP88.2TransReID-SSL (ViT-S)
Person Re-IdentificationMarket-1501Rank-194.2TransReID-SSL (ViT-S)
Person Re-IdentificationMarket-1501Rank-195.3TransReID-SSL (ViT-S w/o RK)

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning2025-07-17WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Try Harder: Hard Sample Generation and Learning for Clothes-Changing Person Re-ID2025-07-15Mind the Gap: Bridging Occlusion in Gait Recognition via Residual Gap Correction2025-07-15Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation2025-07-14