Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking. In this paper, we present a new online joint detection and tracking model, TraDeS (TRAck to DEtect and Segment), exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features for improving current object detection and segmentation. Effectiveness and superiority of TraDeS are shown on 4 datasets, including MOT (2D tracking), nuScenes (3D tracking), MOTS and Youtube-VIS (instance segmentation tracking). Project page: https://jialianwu.com/projects/TraDeS.html.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Multi-Object Tracking | MOTS20 | IDF1 | 58.7 | TraDes |
| Multi-Object Tracking | MOTS20 | sMOTSA | 50.8 | TraDes |
| Multi-Object Tracking | MOT15 | MOTA | 66.5 | Baseline+MFW |
| Multi-Object Tracking | MOT17 | IDF1 | 63.9 | TraDeS |
| Multi-Object Tracking | MOT17 | MOTA | 69.1 | TraDeS |
| Multi-Object Tracking | MOT16 | IDF1 | 64.7 | TraDeS |
| Multi-Object Tracking | MOT16 | MOTA | 70.1 | TraDeS |
| Multi-Object Tracking | DanceTrack | AssA | 25.4 | TraDes |
| Multi-Object Tracking | DanceTrack | DetA | 74.5 | TraDes |
| Multi-Object Tracking | DanceTrack | HOTA | 43.3 | TraDes |
| Multi-Object Tracking | DanceTrack | IDF1 | 41.2 | TraDes |
| Multi-Object Tracking | DanceTrack | MOTA | 86.2 | TraDes |
| Object Tracking | MOTS20 | IDF1 | 58.7 | TraDes |
| Object Tracking | MOTS20 | sMOTSA | 50.8 | TraDes |
| Object Tracking | MOT15 | MOTA | 66.5 | Baseline+MFW |
| Object Tracking | MOT17 | IDF1 | 63.9 | TraDeS |
| Object Tracking | MOT17 | MOTA | 69.1 | TraDeS |
| Object Tracking | MOT16 | IDF1 | 64.7 | TraDeS |
| Object Tracking | MOT16 | MOTA | 70.1 | TraDeS |
| Object Tracking | DanceTrack | AssA | 25.4 | TraDes |
| Object Tracking | DanceTrack | DetA | 74.5 | TraDes |
| Object Tracking | DanceTrack | HOTA | 43.3 | TraDes |
| Object Tracking | DanceTrack | IDF1 | 41.2 | TraDes |
| Object Tracking | DanceTrack | MOTA | 86.2 | TraDes |
| Object Tracking | MOT16 | MOTA | 67.7 | TraDeS |
| Instance Segmentation | nuScenes | MOTA | 68.2 | TraDeS |
| Video Instance Segmentation | YouTube-VIS validation | AP50 | 52.6 | TraDeS |
| Video Instance Segmentation | YouTube-VIS validation | AP75 | 32.8 | TraDeS |
| Video Instance Segmentation | YouTube-VIS validation | mask AP | 32.6 | TraDeS |