Bin Yan, Yi Jiang, Peize Sun, Dong Wang, Zehuan Yuan, Ping Luo, Huchuan Lu
We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Due to the fragmented definitions of the object tracking problem itself, most existing trackers are developed to address a single or part of tasks and overspecialize on the characteristics of specific tasks. By contrast, Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplish the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in 8 tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS. We believe that Unicorn will serve as a solid step towards the general vision model. Code is available at https://github.com/MasterBin-IIAU/Unicorn.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | BDD100K val | mIDF1 | 54 | Unicorn |
| Video | BDD100K val | mMOTA | 41.2 | Unicorn |
| Video | NT-VOT211 | AUC | 34.52 | Unicorn |
| Video | NT-VOT211 | Precision | 47.77 | Unicorn |
| Multi-Object Tracking | MOTS20 | IDF1 | 65.9 | Unicorn |
| Multi-Object Tracking | MOTS20 | sMOTSA | 65.3 | Unicorn |
| Multi-Object Tracking | MOT17 | HOTA | 61.7 | Unicorn |
| Multi-Object Tracking | MOT17 | IDF1 | 75.5 | Unicorn |
| Multi-Object Tracking | MOT17 | MOTA | 77.2 | Unicorn |
| Object Tracking | MOTS20 | IDF1 | 65.9 | Unicorn |
| Object Tracking | MOTS20 | sMOTSA | 65.3 | Unicorn |
| Object Tracking | MOT17 | HOTA | 61.7 | Unicorn |
| Object Tracking | MOT17 | IDF1 | 75.5 | Unicorn |
| Object Tracking | MOT17 | MOTA | 77.2 | Unicorn |
| Object Tracking | LaSOT | AUC | 68.5 | Unicorn |
| Object Tracking | LaSOT | Normalized Precision | 76.6 | Unicorn |
| Object Tracking | LaSOT | Precision | 74.1 | Unicorn |
| Object Tracking | TrackingNet | Accuracy | 83 | Unicorn |
| Object Tracking | TrackingNet | Normalized Precision | 86.4 | Unicorn |
| Object Tracking | TrackingNet | Precision | 82.2 | Unicorn |
| Object Tracking | BDD100K val | mIDF1 | 54 | Unicorn |
| Object Tracking | BDD100K val | mMOTA | 41.2 | Unicorn |
| Object Tracking | NT-VOT211 | AUC | 34.52 | Unicorn |
| Object Tracking | NT-VOT211 | Precision | 47.77 | Unicorn |
| Multi-Object Tracking and Segmentation | BDD100K val | mMOTSA | 29.6 | Unicorn |
| Multiple Object Tracking | BDD100K val | mIDF1 | 54 | Unicorn |
| Multiple Object Tracking | BDD100K val | mMOTA | 41.2 | Unicorn |
| Visual Object Tracking | LaSOT | AUC | 68.5 | Unicorn |
| Visual Object Tracking | LaSOT | Normalized Precision | 76.6 | Unicorn |
| Visual Object Tracking | LaSOT | Precision | 74.1 | Unicorn |
| Visual Object Tracking | TrackingNet | Accuracy | 83 | Unicorn |
| Visual Object Tracking | TrackingNet | Normalized Precision | 86.4 | Unicorn |
| Visual Object Tracking | TrackingNet | Precision | 82.2 | Unicorn |