Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu
Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, a novel few-shot detection framework that incorporates correlational aggregation for meta-learning into DETR detection frameworks. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. Besides, Meta-DETR can simultaneously attend to multiple support classes within a single feed-forward. This unique design allows capturing the inter-class correlation among different classes, which significantly reduces the misclassification of similar classes and enhances knowledge generalization to novel classes. Experiments over multiple few-shot object detection benchmarks show that the proposed Meta-DETR outperforms state-of-the-art methods by large margins. The implementation codes will be released at https://github.com/ZhangGongjie/Meta-DETR.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | MS-COCO (30-shot) | AP | 22.9 | Meta-DETR (Multi-Scale Feature) |
| Object Detection | MS-COCO (30-shot) | AP | 21.3 | Meta-DETR (Single-Scale Feature) |
| Object Detection | MS-COCO (10-shot) | AP | 17.8 | Meta-DETR (Multi-Scale Feature) |
| Object Detection | MS-COCO (10-shot) | AP | 16.7 | Meta-DETR (Single-Scale Feature) |
| 3D | MS-COCO (30-shot) | AP | 22.9 | Meta-DETR (Multi-Scale Feature) |
| 3D | MS-COCO (30-shot) | AP | 21.3 | Meta-DETR (Single-Scale Feature) |
| 3D | MS-COCO (10-shot) | AP | 17.8 | Meta-DETR (Multi-Scale Feature) |
| 3D | MS-COCO (10-shot) | AP | 16.7 | Meta-DETR (Single-Scale Feature) |
| Few-Shot Object Detection | MS-COCO (30-shot) | AP | 22.9 | Meta-DETR (Multi-Scale Feature) |
| Few-Shot Object Detection | MS-COCO (30-shot) | AP | 21.3 | Meta-DETR (Single-Scale Feature) |
| Few-Shot Object Detection | MS-COCO (10-shot) | AP | 17.8 | Meta-DETR (Multi-Scale Feature) |
| Few-Shot Object Detection | MS-COCO (10-shot) | AP | 16.7 | Meta-DETR (Single-Scale Feature) |
| 2D Classification | MS-COCO (30-shot) | AP | 22.9 | Meta-DETR (Multi-Scale Feature) |
| 2D Classification | MS-COCO (30-shot) | AP | 21.3 | Meta-DETR (Single-Scale Feature) |
| 2D Classification | MS-COCO (10-shot) | AP | 17.8 | Meta-DETR (Multi-Scale Feature) |
| 2D Classification | MS-COCO (10-shot) | AP | 16.7 | Meta-DETR (Single-Scale Feature) |
| 2D Object Detection | MS-COCO (30-shot) | AP | 22.9 | Meta-DETR (Multi-Scale Feature) |
| 2D Object Detection | MS-COCO (30-shot) | AP | 21.3 | Meta-DETR (Single-Scale Feature) |
| 2D Object Detection | MS-COCO (10-shot) | AP | 17.8 | Meta-DETR (Multi-Scale Feature) |
| 2D Object Detection | MS-COCO (10-shot) | AP | 16.7 | Meta-DETR (Single-Scale Feature) |
| 16k | MS-COCO (30-shot) | AP | 22.9 | Meta-DETR (Multi-Scale Feature) |
| 16k | MS-COCO (30-shot) | AP | 21.3 | Meta-DETR (Single-Scale Feature) |
| 16k | MS-COCO (10-shot) | AP | 17.8 | Meta-DETR (Multi-Scale Feature) |
| 16k | MS-COCO (10-shot) | AP | 16.7 | Meta-DETR (Single-Scale Feature) |