Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang
We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7\% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods. Code is available at \url{https://github.com/SlongLiu/DAB-DETR}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO minival | AP50 | 67 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | AP75 | 50.2 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | APL | 64.1 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | APM | 50.5 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | APS | 28.1 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | Params (M) | 63 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | box AP | 46.6 | DAB-DETR-DC5-R101 |
| Object Detection | COCO minival | AP50 | 64.7 | DAB-DETR-R101 |
| Object Detection | COCO minival | AP75 | 47.2 | DAB-DETR-R101 |
| Object Detection | COCO minival | APL | 62.9 | DAB-DETR-R101 |
| Object Detection | COCO minival | APM | 48.2 | DAB-DETR-R101 |
| Object Detection | COCO minival | APS | 24.1 | DAB-DETR-R101 |
| Object Detection | COCO minival | Params (M) | 63 | DAB-DETR-R101 |
| Object Detection | COCO minival | box AP | 44.1 | DAB-DETR-R101 |
| 3D | COCO minival | AP50 | 67 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | AP75 | 50.2 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | APL | 64.1 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | APM | 50.5 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | APS | 28.1 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | Params (M) | 63 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | box AP | 46.6 | DAB-DETR-DC5-R101 |
| 3D | COCO minival | AP50 | 64.7 | DAB-DETR-R101 |
| 3D | COCO minival | AP75 | 47.2 | DAB-DETR-R101 |
| 3D | COCO minival | APL | 62.9 | DAB-DETR-R101 |
| 3D | COCO minival | APM | 48.2 | DAB-DETR-R101 |
| 3D | COCO minival | APS | 24.1 | DAB-DETR-R101 |
| 3D | COCO minival | Params (M) | 63 | DAB-DETR-R101 |
| 3D | COCO minival | box AP | 44.1 | DAB-DETR-R101 |
| 2D Classification | COCO minival | AP50 | 67 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | AP75 | 50.2 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | APL | 64.1 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | APM | 50.5 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | APS | 28.1 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | Params (M) | 63 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | box AP | 46.6 | DAB-DETR-DC5-R101 |
| 2D Classification | COCO minival | AP50 | 64.7 | DAB-DETR-R101 |
| 2D Classification | COCO minival | AP75 | 47.2 | DAB-DETR-R101 |
| 2D Classification | COCO minival | APL | 62.9 | DAB-DETR-R101 |
| 2D Classification | COCO minival | APM | 48.2 | DAB-DETR-R101 |
| 2D Classification | COCO minival | APS | 24.1 | DAB-DETR-R101 |
| 2D Classification | COCO minival | Params (M) | 63 | DAB-DETR-R101 |
| 2D Classification | COCO minival | box AP | 44.1 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | AP50 | 67 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | AP75 | 50.2 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | APL | 64.1 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | APM | 50.5 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | APS | 28.1 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | Params (M) | 63 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | box AP | 46.6 | DAB-DETR-DC5-R101 |
| 2D Object Detection | COCO minival | AP50 | 64.7 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | AP75 | 47.2 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | APL | 62.9 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | APM | 48.2 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | APS | 24.1 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | Params (M) | 63 | DAB-DETR-R101 |
| 2D Object Detection | COCO minival | box AP | 44.1 | DAB-DETR-R101 |
| 16k | COCO minival | AP50 | 67 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | AP75 | 50.2 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | APL | 64.1 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | APM | 50.5 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | APS | 28.1 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | Params (M) | 63 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | box AP | 46.6 | DAB-DETR-DC5-R101 |
| 16k | COCO minival | AP50 | 64.7 | DAB-DETR-R101 |
| 16k | COCO minival | AP75 | 47.2 | DAB-DETR-R101 |
| 16k | COCO minival | APL | 62.9 | DAB-DETR-R101 |
| 16k | COCO minival | APM | 48.2 | DAB-DETR-R101 |
| 16k | COCO minival | APS | 24.1 | DAB-DETR-R101 |
| 16k | COCO minival | Params (M) | 63 | DAB-DETR-R101 |
| 16k | COCO minival | box AP | 44.1 | DAB-DETR-R101 |