Boyuan Meng, Xiaohan Zhang, Peilin Li, Zhe Wu, Yiming Li, Wenkai Zhao, Beinan Yu, Hui-Liang Shen
Cross-domain few-shot object detection (CD-FSOD) aims to detect novel objects across different domains with limited class instances. Feature confusion, including object-background confusion and object-object confusion, presents significant challenges in both cross-domain and few-shot settings. In this work, we introduce CDFormer, a cross-domain few-shot object detection transformer against feature confusion, to address these challenges. The method specifically tackles feature confusion through two key modules: object-background distinguishing (OBD) and object-object distinguishing (OOD). The OBD module leverages a learnable background token to differentiate between objects and background, while the OOD module enhances the distinction between objects of different classes. Experimental results demonstrate that CDFormer outperforms previous state-of-the-art approaches, achieving 12.9% mAP, 11.0% mAP, and 10.4% mAP improvements under the 1/5/10 shot settings, respectively, when fine-tuned.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | Artaxor | mAP | 68.7 | CDFormer(w/FT) |
| Object Detection | Artaxor | mAP | 37.3 | CDFormer(w/o FT) |
| Object Detection | NEU-DET | mAP | 18.1 | CDFormer(w/FT) |
| Object Detection | NEU-DET | mAP | 4 | CDFormer(w/o FT) |
| Object Detection | DIOR | mAP | 32.5 | CDFormer(w/FT) |
| Object Detection | DIOR | mAP | 7.9 | CDFormer(w/o FT) |
| Object Detection | Clipark1k | mAP | 59 | CDFormer(w/FT) |
| Object Detection | Clipark1k | mAP | 53.5 | CDFormer(w/o FT) |
| Object Detection | DeepFish | mAP | 35.5 | CDFormer(w/FT) |
| Object Detection | DeepFish | mAP | 25.7 | CDFormer(w/o FT) |
| Object Detection | UODD | mAP | 26.4 | CDFormer(w/FT) |
| Object Detection | UODD | mAP | 16.7 | CDFormer(w/o FT) |
| 3D | Artaxor | mAP | 68.7 | CDFormer(w/FT) |
| 3D | Artaxor | mAP | 37.3 | CDFormer(w/o FT) |
| 3D | NEU-DET | mAP | 18.1 | CDFormer(w/FT) |
| 3D | NEU-DET | mAP | 4 | CDFormer(w/o FT) |
| 3D | DIOR | mAP | 32.5 | CDFormer(w/FT) |
| 3D | DIOR | mAP | 7.9 | CDFormer(w/o FT) |
| 3D | Clipark1k | mAP | 59 | CDFormer(w/FT) |
| 3D | Clipark1k | mAP | 53.5 | CDFormer(w/o FT) |
| 3D | DeepFish | mAP | 35.5 | CDFormer(w/FT) |
| 3D | DeepFish | mAP | 25.7 | CDFormer(w/o FT) |
| 3D | UODD | mAP | 26.4 | CDFormer(w/FT) |
| 3D | UODD | mAP | 16.7 | CDFormer(w/o FT) |
| Few-Shot Object Detection | Artaxor | mAP | 68.7 | CDFormer(w/FT) |
| Few-Shot Object Detection | Artaxor | mAP | 37.3 | CDFormer(w/o FT) |
| Few-Shot Object Detection | NEU-DET | mAP | 18.1 | CDFormer(w/FT) |
| Few-Shot Object Detection | NEU-DET | mAP | 4 | CDFormer(w/o FT) |
| Few-Shot Object Detection | DIOR | mAP | 32.5 | CDFormer(w/FT) |
| Few-Shot Object Detection | DIOR | mAP | 7.9 | CDFormer(w/o FT) |
| Few-Shot Object Detection | Clipark1k | mAP | 59 | CDFormer(w/FT) |
| Few-Shot Object Detection | Clipark1k | mAP | 53.5 | CDFormer(w/o FT) |
| Few-Shot Object Detection | DeepFish | mAP | 35.5 | CDFormer(w/FT) |
| Few-Shot Object Detection | DeepFish | mAP | 25.7 | CDFormer(w/o FT) |
| Few-Shot Object Detection | UODD | mAP | 26.4 | CDFormer(w/FT) |
| Few-Shot Object Detection | UODD | mAP | 16.7 | CDFormer(w/o FT) |
| 2D Classification | Artaxor | mAP | 68.7 | CDFormer(w/FT) |
| 2D Classification | Artaxor | mAP | 37.3 | CDFormer(w/o FT) |
| 2D Classification | NEU-DET | mAP | 18.1 | CDFormer(w/FT) |
| 2D Classification | NEU-DET | mAP | 4 | CDFormer(w/o FT) |
| 2D Classification | DIOR | mAP | 32.5 | CDFormer(w/FT) |
| 2D Classification | DIOR | mAP | 7.9 | CDFormer(w/o FT) |
| 2D Classification | Clipark1k | mAP | 59 | CDFormer(w/FT) |
| 2D Classification | Clipark1k | mAP | 53.5 | CDFormer(w/o FT) |
| 2D Classification | DeepFish | mAP | 35.5 | CDFormer(w/FT) |
| 2D Classification | DeepFish | mAP | 25.7 | CDFormer(w/o FT) |
| 2D Classification | UODD | mAP | 26.4 | CDFormer(w/FT) |
| 2D Classification | UODD | mAP | 16.7 | CDFormer(w/o FT) |
| 2D Object Detection | Artaxor | mAP | 68.7 | CDFormer(w/FT) |
| 2D Object Detection | Artaxor | mAP | 37.3 | CDFormer(w/o FT) |
| 2D Object Detection | NEU-DET | mAP | 18.1 | CDFormer(w/FT) |
| 2D Object Detection | NEU-DET | mAP | 4 | CDFormer(w/o FT) |
| 2D Object Detection | DIOR | mAP | 32.5 | CDFormer(w/FT) |
| 2D Object Detection | DIOR | mAP | 7.9 | CDFormer(w/o FT) |
| 2D Object Detection | Clipark1k | mAP | 59 | CDFormer(w/FT) |
| 2D Object Detection | Clipark1k | mAP | 53.5 | CDFormer(w/o FT) |
| 2D Object Detection | DeepFish | mAP | 35.5 | CDFormer(w/FT) |
| 2D Object Detection | DeepFish | mAP | 25.7 | CDFormer(w/o FT) |
| 2D Object Detection | UODD | mAP | 26.4 | CDFormer(w/FT) |
| 2D Object Detection | UODD | mAP | 16.7 | CDFormer(w/o FT) |
| 16k | Artaxor | mAP | 68.7 | CDFormer(w/FT) |
| 16k | Artaxor | mAP | 37.3 | CDFormer(w/o FT) |
| 16k | NEU-DET | mAP | 18.1 | CDFormer(w/FT) |
| 16k | NEU-DET | mAP | 4 | CDFormer(w/o FT) |
| 16k | DIOR | mAP | 32.5 | CDFormer(w/FT) |
| 16k | DIOR | mAP | 7.9 | CDFormer(w/o FT) |
| 16k | Clipark1k | mAP | 59 | CDFormer(w/FT) |
| 16k | Clipark1k | mAP | 53.5 | CDFormer(w/o FT) |
| 16k | DeepFish | mAP | 35.5 | CDFormer(w/FT) |
| 16k | DeepFish | mAP | 25.7 | CDFormer(w/o FT) |
| 16k | UODD | mAP | 26.4 | CDFormer(w/FT) |
| 16k | UODD | mAP | 16.7 | CDFormer(w/o FT) |