TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TP-GMOT: Tracking Generic Multiple Object by Textual Promp...

TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT

Duy Le Dinh Anh, Kim Hoang Tran, Ngan Hoang Le

2024-09-04Multi-Object TrackingObject TrackingMultiple Object Trackingobject-detectionObject Detection
PaperPDFCode(official)

Abstract

While Multi-Object Tracking (MOT) has made substantial advancements, it is limited by heavy reliance on prior knowledge and limited to predefined categories. In contrast, Generic Multiple Object Tracking (GMOT), tracking multiple objects with similar appearance, requires less prior information about the targets but faces challenges with variants like viewpoint, lighting, occlusion, and resolution. Our contributions commence with the introduction of the \textbf{\text{Refer-GMOT dataset}} a collection of videos, each accompanied by fine-grained textual descriptions of their attributes. Subsequently, we introduce a novel text prompt-based open-vocabulary GMOT framework, called \textbf{\text{TP-GMOT}}, which can track never-seen object categories with zero training examples. Within \text{TP-GMOT} framework, we introduce two novel components: (i) {\textbf{\text{TP-OD}}, an object detection by a textual prompt}, for accurately detecting unseen objects with specific characteristics. (ii) Motion-Appearance Cost SORT \textbf{\text{MAC-SORT}}, a novel object association approach that adeptly integrates motion and appearance-based matching strategies to tackle the complex task of tracking multiple generic objects with high similarity. Our contributions are benchmarked on the \text{Refer-GMOT} dataset for GMOT task. Additionally, to assess the generalizability of the proposed \text{TP-GMOT} framework and the effectiveness of \text{MAC-SORT} tracker, we conduct ablation studies on the DanceTrack and MOT20 datasets for the MOT task. Our dataset, code, and models will be publicly available at: https://fsoft-aic.github.io/TP-GMOT

Results

TaskDatasetMetricValueModel
VideoGMOT-40HOTA58.58MAC-SORT
VideoGMOT-40IDF171.7MAC-SORT
VideoGMOT-40MOTA67.77MAC-SORT
Object TrackingGMOT-40HOTA58.58MAC-SORT
Object TrackingGMOT-40IDF171.7MAC-SORT
Object TrackingGMOT-40MOTA67.77MAC-SORT
Object DetectionGMOT-40mAP@0.572.7iGDINO MAC-SORT
3DGMOT-40mAP@0.572.7iGDINO MAC-SORT
Multiple Object TrackingGMOT-40HOTA58.58MAC-SORT
Multiple Object TrackingGMOT-40IDF171.7MAC-SORT
Multiple Object TrackingGMOT-40MOTA67.77MAC-SORT
2D ClassificationGMOT-40mAP@0.572.7iGDINO MAC-SORT
2D Object DetectionGMOT-40mAP@0.572.7iGDINO MAC-SORT
16kGMOT-40mAP@0.572.7iGDINO MAC-SORT

Related Papers

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15