TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Detect Everything with Few Examples

Detect Everything with Few Examples

Xinyu Zhang, YuHan Liu, Yuting Wang, Abdeslam Boularias

2023-09-22Few-Shot Object DetectionBinary ClassificationOpen Vocabulary Object Detectionobject-detectionCross-Domain Few-Shot Object DetectionObject DetectionOne-Shot Object Detection
PaperPDFCode(official)

Abstract

Few-shot object detection aims at detecting novel categories given only a few example images. It is a basic skill for a robot to perform tasks in open environments. Recent methods focus on finetuning strategies, with complicated procedures that prohibit a wider application. In this paper, we introduce DE-ViT, a few-shot object detector without the need for finetuning. DE-ViT's novel architecture is based on a new region-propagation mechanism for localization. The propagated region masks are transformed into bounding boxes through a learnable spatial integral layer. Instead of training prototype classifiers, we propose to use prototypes to project ViT features into a subspace that is robust to overfitting on base classes. We evaluate DE-ViT on few-shot, and one-shot object detection benchmarks with Pascal VOC, COCO, and LVIS. DE-ViT establishes new state-of-the-art results on all benchmarks. Notably, for COCO, DE-ViT surpasses the few-shot SoTA by 15 mAP on 10-shot and 7.2 mAP on 30-shot and one-shot SoTA by 2.8 AP50. For LVIS, DE-ViT outperforms few-shot SoTA by 17 box APr. Further, we evaluate DE-ViT with a real robot by building a pick-and-place system for sorting novel objects based on example images. The videos of our robot demonstrations, the source code and the models of DE-ViT can be found at https://mlzxy.github.io/devit.

Results

TaskDatasetMetricValueModel
Object DetectionMS-COCO (30-shot)AP34DE-ViT
Object DetectionMS-COCO (10-shot)AP34DE-ViT
Object DetectionArtaxor mAP49.2DE-ViT-FT
Object DetectionArtaxor mAP9.2DE-ViT(w/o FT)
Object DetectionNEU-DETmAP8.8DE-ViT-FT
Object DetectionNEU-DETmAP1.8DE-ViT(w/o FT)
Object DetectionDIORmAP25.6DE-ViT-FT
Object DetectionDIORmAP8.4DE-ViT(w/o FT)
Object DetectionClipark1k mAP40.8DE-ViT-FT
Object DetectionClipark1k mAP11DE-ViT(w/o FT)
Object DetectionDeepFishmAP21.3DE-ViT-FT
Object DetectionDeepFishmAP2.1DE-ViT(w/o FT)
Object DetectionUODDmAP5.4DE-ViT-FT
Object DetectionUODDmAP3.1DE-ViT(w/o FT)
Object DetectionLVIS v1.0AP novel-LVIS base training34.3DE-ViT
Object DetectionMSCOCOAP 0.550DE-ViT
Object DetectionCOCO (Common Objects in Context)AP 0.528.4DE-ViT
3DMS-COCO (30-shot)AP34DE-ViT
3DMS-COCO (10-shot)AP34DE-ViT
3DArtaxor mAP49.2DE-ViT-FT
3DArtaxor mAP9.2DE-ViT(w/o FT)
3DNEU-DETmAP8.8DE-ViT-FT
3DNEU-DETmAP1.8DE-ViT(w/o FT)
3DDIORmAP25.6DE-ViT-FT
3DDIORmAP8.4DE-ViT(w/o FT)
3DClipark1k mAP40.8DE-ViT-FT
3DClipark1k mAP11DE-ViT(w/o FT)
3DDeepFishmAP21.3DE-ViT-FT
3DDeepFishmAP2.1DE-ViT(w/o FT)
3DUODDmAP5.4DE-ViT-FT
3DUODDmAP3.1DE-ViT(w/o FT)
3DLVIS v1.0AP novel-LVIS base training34.3DE-ViT
3DMSCOCOAP 0.550DE-ViT
3DCOCO (Common Objects in Context)AP 0.528.4DE-ViT
Few-Shot Object DetectionMS-COCO (30-shot)AP34DE-ViT
Few-Shot Object DetectionMS-COCO (10-shot)AP34DE-ViT
Few-Shot Object DetectionArtaxor mAP49.2DE-ViT-FT
Few-Shot Object DetectionArtaxor mAP9.2DE-ViT(w/o FT)
Few-Shot Object DetectionNEU-DETmAP8.8DE-ViT-FT
Few-Shot Object DetectionNEU-DETmAP1.8DE-ViT(w/o FT)
Few-Shot Object DetectionDIORmAP25.6DE-ViT-FT
Few-Shot Object DetectionDIORmAP8.4DE-ViT(w/o FT)
Few-Shot Object DetectionClipark1k mAP40.8DE-ViT-FT
Few-Shot Object DetectionClipark1k mAP11DE-ViT(w/o FT)
Few-Shot Object DetectionDeepFishmAP21.3DE-ViT-FT
Few-Shot Object DetectionDeepFishmAP2.1DE-ViT(w/o FT)
Few-Shot Object DetectionUODDmAP5.4DE-ViT-FT
Few-Shot Object DetectionUODDmAP3.1DE-ViT(w/o FT)
2D ClassificationMS-COCO (30-shot)AP34DE-ViT
2D ClassificationMS-COCO (10-shot)AP34DE-ViT
2D ClassificationArtaxor mAP49.2DE-ViT-FT
2D ClassificationArtaxor mAP9.2DE-ViT(w/o FT)
2D ClassificationNEU-DETmAP8.8DE-ViT-FT
2D ClassificationNEU-DETmAP1.8DE-ViT(w/o FT)
2D ClassificationDIORmAP25.6DE-ViT-FT
2D ClassificationDIORmAP8.4DE-ViT(w/o FT)
2D ClassificationClipark1k mAP40.8DE-ViT-FT
2D ClassificationClipark1k mAP11DE-ViT(w/o FT)
2D ClassificationDeepFishmAP21.3DE-ViT-FT
2D ClassificationDeepFishmAP2.1DE-ViT(w/o FT)
2D ClassificationUODDmAP5.4DE-ViT-FT
2D ClassificationUODDmAP3.1DE-ViT(w/o FT)
2D ClassificationLVIS v1.0AP novel-LVIS base training34.3DE-ViT
2D ClassificationMSCOCOAP 0.550DE-ViT
2D ClassificationCOCO (Common Objects in Context)AP 0.528.4DE-ViT
2D Object DetectionMS-COCO (30-shot)AP34DE-ViT
2D Object DetectionMS-COCO (10-shot)AP34DE-ViT
2D Object DetectionArtaxor mAP49.2DE-ViT-FT
2D Object DetectionArtaxor mAP9.2DE-ViT(w/o FT)
2D Object DetectionNEU-DETmAP8.8DE-ViT-FT
2D Object DetectionNEU-DETmAP1.8DE-ViT(w/o FT)
2D Object DetectionDIORmAP25.6DE-ViT-FT
2D Object DetectionDIORmAP8.4DE-ViT(w/o FT)
2D Object DetectionClipark1k mAP40.8DE-ViT-FT
2D Object DetectionClipark1k mAP11DE-ViT(w/o FT)
2D Object DetectionDeepFishmAP21.3DE-ViT-FT
2D Object DetectionDeepFishmAP2.1DE-ViT(w/o FT)
2D Object DetectionUODDmAP5.4DE-ViT-FT
2D Object DetectionUODDmAP3.1DE-ViT(w/o FT)
2D Object DetectionLVIS v1.0AP novel-LVIS base training34.3DE-ViT
2D Object DetectionMSCOCOAP 0.550DE-ViT
2D Object DetectionCOCO (Common Objects in Context)AP 0.528.4DE-ViT
Open Vocabulary Object DetectionLVIS v1.0AP novel-LVIS base training34.3DE-ViT
Open Vocabulary Object DetectionMSCOCOAP 0.550DE-ViT
16kMS-COCO (30-shot)AP34DE-ViT
16kMS-COCO (10-shot)AP34DE-ViT
16kArtaxor mAP49.2DE-ViT-FT
16kArtaxor mAP9.2DE-ViT(w/o FT)
16kNEU-DETmAP8.8DE-ViT-FT
16kNEU-DETmAP1.8DE-ViT(w/o FT)
16kDIORmAP25.6DE-ViT-FT
16kDIORmAP8.4DE-ViT(w/o FT)
16kClipark1k mAP40.8DE-ViT-FT
16kClipark1k mAP11DE-ViT(w/o FT)
16kDeepFishmAP21.3DE-ViT-FT
16kDeepFishmAP2.1DE-ViT(w/o FT)
16kUODDmAP5.4DE-ViT-FT
16kUODDmAP3.1DE-ViT(w/o FT)
16kLVIS v1.0AP novel-LVIS base training34.3DE-ViT
16kMSCOCOAP 0.550DE-ViT
16kCOCO (Common Objects in Context)AP 0.528.4DE-ViT

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework2025-07-10ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08