TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unified Single-Stage Transformer Network for Efficient RGB...

Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

Jianqiang Xia, Dianxi Shi, Ke Song, Linna Song, Xiaolei Wang, Songchang Jin, Li Zhou, Yu Cheng, Lei Jin, Zheng Zhu, Jianan Li, Gang Wang, Junliang Xing, Jian Zhao

2023-08-26feature selectionRgb-T Tracking
PaperPDFCode(official)

Abstract

Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network's ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restricts the tracking speed. To overcome these problems, we propose a unified single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone with a dual embedding layer through self-attention mechanism. With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities. Simultaneously, relation modeling is performed between these features, efficiently obtaining the search region fusion features with better target-background discriminability for prediction. Furthermore, we introduce a novel feature selection mechanism based on modality reliability to mitigate the influence of invalid modalities for prediction, further improving the tracking performance. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance while maintaining the fastest inference speed 84.2FPS. In particular, MPR/MSR on the short-term and long-term subsets of VTUAV dataset increased by 11.1$\%$/11.7$\%$ and 11.3$\%$/9.7$\%$.

Results

TaskDatasetMetricValueModel
Visual TrackingGTOTPrecision93.4USTrack
Visual TrackingGTOTSuccess78.3USTrack
Visual TrackingRGBT234Precision87.4USTrack
Visual TrackingRGBT234Success65.8USTrack

Related Papers

mNARX+: A surrogate model for complex dynamical systems using manifold-NARX and automatic feature selection2025-07-17Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection2025-07-15Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning2025-07-14From Motion to Meaning: Biomechanics-Informed Neural Network for Explainable Cardiovascular Disease Identification2025-07-08Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS2025-06-25Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach2025-06-25scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection2025-06-25Lightweight RGB-T Tracking with Mobile Vision Transformers2025-06-23