Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc van Gool
The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | BDD100K test | mHOTA | 46.1 | ContrasTR |
| Video | BDD100K test | mIDF1 | 56.5 | ContrasTR |
| Video | BDD100K test | mMOTA | 42.8 | ContrasTR |
| Video | BDD100K val | mIDF1 | 52.9 | ContrasTR |
| Video | BDD100K val | mMOTA | 41.7 | ContrasTR |
| Multi-Object Tracking | MOT17 | HOTA | 58.9 | ContrasTR |
| Multi-Object Tracking | MOT17 | IDF1 | 71.8 | ContrasTR |
| Multi-Object Tracking | MOT17 | MOTA | 73.7 | ContrasTR |
| Object Tracking | MOT17 | HOTA | 58.9 | ContrasTR |
| Object Tracking | MOT17 | IDF1 | 71.8 | ContrasTR |
| Object Tracking | MOT17 | MOTA | 73.7 | ContrasTR |
| Object Tracking | BDD100K test | mHOTA | 46.1 | ContrasTR |
| Object Tracking | BDD100K test | mIDF1 | 56.5 | ContrasTR |
| Object Tracking | BDD100K test | mMOTA | 42.8 | ContrasTR |
| Object Tracking | BDD100K val | mIDF1 | 52.9 | ContrasTR |
| Object Tracking | BDD100K val | mMOTA | 41.7 | ContrasTR |
| Multiple Object Tracking | BDD100K test | mHOTA | 46.1 | ContrasTR |
| Multiple Object Tracking | BDD100K test | mIDF1 | 56.5 | ContrasTR |
| Multiple Object Tracking | BDD100K test | mMOTA | 42.8 | ContrasTR |
| Multiple Object Tracking | BDD100K val | mIDF1 | 52.9 | ContrasTR |
| Multiple Object Tracking | BDD100K val | mMOTA | 41.7 | ContrasTR |