Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang
This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64.6 AP on COCO val2017 and 64.8 AP on COCO test-dev using only 700M parameters without any test time augmentation. It explores the combination of the powerful FocalNet-Huge backbone with the effective Stable-DINO detector. Different from existing SOTA models that utilize an extensive number of parameters and complex training techniques on large-scale private data or merged data, our model is exclusively trained on the publicly available dataset Objects365, which ensures the reproducibility of our approach.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO test-dev | AP50 | 81.7 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO test-dev | AP75 | 71.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO test-dev | APL | 78 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO test-dev | APM | 67.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO test-dev | APS | 48.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO test-dev | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO test-dev | box mAP | 64.8 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | AP50 | 81.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | AP75 | 71.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | APL | 78.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | APM | 68.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | APS | 50.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| Object Detection | COCO minival | box AP | 64.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | AP50 | 81.7 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | AP75 | 71.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | APL | 78 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | APM | 67.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | APS | 48.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO test-dev | box mAP | 64.8 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | AP50 | 81.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | AP75 | 71.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | APL | 78.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | APM | 68.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | APS | 50.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 3D | COCO minival | box AP | 64.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | AP50 | 81.7 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | AP75 | 71.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | APL | 78 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | APM | 67.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | APS | 48.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO test-dev | box mAP | 64.8 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | AP50 | 81.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | AP75 | 71.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | APL | 78.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | APM | 68.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | APS | 50.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Classification | COCO minival | box AP | 64.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | AP50 | 81.7 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | AP75 | 71.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | APL | 78 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | APM | 67.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | APS | 48.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO test-dev | box mAP | 64.8 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | AP50 | 81.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | AP75 | 71.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | APL | 78.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | APM | 68.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | APS | 50.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 2D Object Detection | COCO minival | box AP | 64.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | AP50 | 81.7 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | AP75 | 71.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | APL | 78 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | APM | 67.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | APS | 48.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO test-dev | box mAP | 64.8 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | AP50 | 81.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | AP75 | 71.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | APL | 78.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | APM | 68.5 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | APS | 50.4 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | Params (M) | 689 | Focal-Stable-DINO (Focal-Huge, no TTA) |
| 16k | COCO minival | box AP | 64.6 | Focal-Stable-DINO (Focal-Huge, no TTA) |