Fangyi Chen, Han Zhang, Kai Hu, Yu-Kai Huang, Chenchen Zhu, Marios Savvides
This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage. We review the training process and attribute the overlooked phenomenon to two limitations: lack of training emphasis and cascading errors from decoding sequence. We design and present Selective Query Recollection (SQR), a simple and effective training strategy for query-based object detectors. It cumulatively collects intermediate queries as decoding stages go deeper and selectively forwards the queries to the downstream stages aside from the sequential structure. Such-wise, SQR places training emphasis on later stages and allows later stages to work with intermediate queries from earlier stages directly. SQR can be easily plugged into various query-based object detectors and significantly enhances their performance while leaving the inference pipeline unchanged. As a result, we apply SQR on Adamixer, DAB-DETR, and Deformable-DETR across various settings (backbone, number of queries, schedule) and consistently brings 1.4-2.8 AP improvement.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO 2017 val | AP | 49.8 | SQR-Adamixer-R101 |
| Object Detection | COCO 2017 val | AP | 48.9 | SQR-Adamixer-R50 |
| 3D | COCO 2017 val | AP | 49.8 | SQR-Adamixer-R101 |
| 3D | COCO 2017 val | AP | 48.9 | SQR-Adamixer-R50 |
| 2D Classification | COCO 2017 val | AP | 49.8 | SQR-Adamixer-R101 |
| 2D Classification | COCO 2017 val | AP | 48.9 | SQR-Adamixer-R50 |
| 2D Object Detection | COCO 2017 val | AP | 49.8 | SQR-Adamixer-R101 |
| 2D Object Detection | COCO 2017 val | AP | 48.9 | SQR-Adamixer-R50 |
| 16k | COCO 2017 val | AP | 49.8 | SQR-Adamixer-R101 |
| 16k | COCO 2017 val | AP | 48.9 | SQR-Adamixer-R50 |