Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson
Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture. Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. During pretraining, DETReg predicts object localizations to match the localizations from an unsupervised region proposal generator and simultaneously aligns the corresponding feature embeddings with embeddings from a self-supervised image encoder. We implement DETReg using the DETR family of detectors and show that it improves over competitive baselines when finetuned on COCO, PASCAL VOC, and Airbus Ship benchmarks. In low-data regimes DETReg achieves improved performance, e.g., when training with only 1% of the labels and in the few-shot learning settings.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | PASCAL VOC 10% | AP | 51.4 | DETReg (ours) |
| Object Detection | PASCAL VOC 10% | AP50 | 72.2 | DETReg (ours) |
| Object Detection | PASCAL VOC 10% | AP75 | 56.6 | DETReg (ours) |
| Object Detection | COCO 2017 | AP | 30 | DETReg (ours) |
| Object Detection | MS-COCO (30-shot) | AP | 30 | DETReg-ft-full DDETR |
| Object Detection | MS-COCO (10-shot) | AP | 25 | DETReg-ft-full DDETR |
| 3D | PASCAL VOC 10% | AP | 51.4 | DETReg (ours) |
| 3D | PASCAL VOC 10% | AP50 | 72.2 | DETReg (ours) |
| 3D | PASCAL VOC 10% | AP75 | 56.6 | DETReg (ours) |
| 3D | COCO 2017 | AP | 30 | DETReg (ours) |
| 3D | MS-COCO (30-shot) | AP | 30 | DETReg-ft-full DDETR |
| 3D | MS-COCO (10-shot) | AP | 25 | DETReg-ft-full DDETR |
| Few-Shot Object Detection | COCO 2017 | AP | 30 | DETReg (ours) |
| Few-Shot Object Detection | MS-COCO (30-shot) | AP | 30 | DETReg-ft-full DDETR |
| Few-Shot Object Detection | MS-COCO (10-shot) | AP | 25 | DETReg-ft-full DDETR |
| 2D Classification | PASCAL VOC 10% | AP | 51.4 | DETReg (ours) |
| 2D Classification | PASCAL VOC 10% | AP50 | 72.2 | DETReg (ours) |
| 2D Classification | PASCAL VOC 10% | AP75 | 56.6 | DETReg (ours) |
| 2D Classification | COCO 2017 | AP | 30 | DETReg (ours) |
| 2D Classification | MS-COCO (30-shot) | AP | 30 | DETReg-ft-full DDETR |
| 2D Classification | MS-COCO (10-shot) | AP | 25 | DETReg-ft-full DDETR |
| 2D Object Detection | PASCAL VOC 10% | AP | 51.4 | DETReg (ours) |
| 2D Object Detection | PASCAL VOC 10% | AP50 | 72.2 | DETReg (ours) |
| 2D Object Detection | PASCAL VOC 10% | AP75 | 56.6 | DETReg (ours) |
| 2D Object Detection | COCO 2017 | AP | 30 | DETReg (ours) |
| 2D Object Detection | MS-COCO (30-shot) | AP | 30 | DETReg-ft-full DDETR |
| 2D Object Detection | MS-COCO (10-shot) | AP | 25 | DETReg-ft-full DDETR |
| Unsupervised Instance Segmentation | COCO val2017 | AP | 3.3 | DETReg |
| Unsupervised Instance Segmentation | COCO val2017 | AP50 | 8.8 | DETReg |
| Unsupervised Instance Segmentation | COCO val2017 | AP75 | 1.9 | DETReg |
| 16k | PASCAL VOC 10% | AP | 51.4 | DETReg (ours) |
| 16k | PASCAL VOC 10% | AP50 | 72.2 | DETReg (ours) |
| 16k | PASCAL VOC 10% | AP75 | 56.6 | DETReg (ours) |
| 16k | COCO 2017 | AP | 30 | DETReg (ours) |
| 16k | MS-COCO (30-shot) | AP | 30 | DETReg-ft-full DDETR |
| 16k | MS-COCO (10-shot) | AP | 25 | DETReg-ft-full DDETR |