TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Detecting Twenty-thousand Classes using Image-level Superv...

Detecting Twenty-thousand Classes using Image-level Supervision

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra

2022-01-07Image ClassificationOpen Vocabulary Object DetectionCross-Domain Few-Shot Object Detection
PaperPDFCode(official)

Abstract

Current object detectors are limited in vocabulary size due to the small scale of detection datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as their datasets are larger and easier to collect. We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts. Unlike prior work, Detic does not need complex assignment schemes to assign image labels to boxes based on model predictions, making it much easier to implement and compatible with a range of detection architectures and backbones. Our results show that Detic yields excellent detectors even for classes without box annotations. It outperforms prior work on both open-vocabulary and long-tail detection benchmarks. Detic provides a gain of 2.4 mAP for all classes and 8.3 mAP for novel classes on the open-vocabulary LVIS benchmark. On the standard LVIS benchmark, Detic obtains 41.7 mAP when evaluated on all classes, or only rare classes, hence closing the gap in performance for object categories with few samples. For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning. Code is available at \url{https://github.com/facebookresearch/Detic}.

Results

TaskDatasetMetricValueModel
Object DetectionArtaxor mAP12Detic-FT
Object DetectionArtaxor mAP0.6Detic(w/o FT)
Object DetectionNEU-DETmAP16.8Detic-FT
Object DetectionDIORmAP15.4Detic-FT
Object DetectionDIORmAP0.1Detic(w/o FT)
Object DetectionClipark1k mAP22.3Detic-FT
Object DetectionClipark1k mAP11.4Detic(w/o FT)
Object DetectionDeepFishmAP17.9Detic-FT
Object DetectionDeepFishmAP0.9Detic(w/o FT)
Object DetectionUODDmAP16.8Detic-FT
Object DetectionLVIS v1.0AP novel-LVIS base training17.8Detic
Object DetectionMSCOCOAP 0.527.8Detic
Object DetectionOpenImages-v4AP 0.542.2Detic
Object DetectionOpenImages-v4mask AP5042.2Detic
3DArtaxor mAP12Detic-FT
3DArtaxor mAP0.6Detic(w/o FT)
3DNEU-DETmAP16.8Detic-FT
3DDIORmAP15.4Detic-FT
3DDIORmAP0.1Detic(w/o FT)
3DClipark1k mAP22.3Detic-FT
3DClipark1k mAP11.4Detic(w/o FT)
3DDeepFishmAP17.9Detic-FT
3DDeepFishmAP0.9Detic(w/o FT)
3DUODDmAP16.8Detic-FT
3DLVIS v1.0AP novel-LVIS base training17.8Detic
3DMSCOCOAP 0.527.8Detic
3DOpenImages-v4AP 0.542.2Detic
3DOpenImages-v4mask AP5042.2Detic
Few-Shot Object DetectionArtaxor mAP12Detic-FT
Few-Shot Object DetectionArtaxor mAP0.6Detic(w/o FT)
Few-Shot Object DetectionNEU-DETmAP16.8Detic-FT
Few-Shot Object DetectionDIORmAP15.4Detic-FT
Few-Shot Object DetectionDIORmAP0.1Detic(w/o FT)
Few-Shot Object DetectionClipark1k mAP22.3Detic-FT
Few-Shot Object DetectionClipark1k mAP11.4Detic(w/o FT)
Few-Shot Object DetectionDeepFishmAP17.9Detic-FT
Few-Shot Object DetectionDeepFishmAP0.9Detic(w/o FT)
Few-Shot Object DetectionUODDmAP16.8Detic-FT
2D ClassificationArtaxor mAP12Detic-FT
2D ClassificationArtaxor mAP0.6Detic(w/o FT)
2D ClassificationNEU-DETmAP16.8Detic-FT
2D ClassificationDIORmAP15.4Detic-FT
2D ClassificationDIORmAP0.1Detic(w/o FT)
2D ClassificationClipark1k mAP22.3Detic-FT
2D ClassificationClipark1k mAP11.4Detic(w/o FT)
2D ClassificationDeepFishmAP17.9Detic-FT
2D ClassificationDeepFishmAP0.9Detic(w/o FT)
2D ClassificationUODDmAP16.8Detic-FT
2D ClassificationLVIS v1.0AP novel-LVIS base training17.8Detic
2D ClassificationMSCOCOAP 0.527.8Detic
2D ClassificationOpenImages-v4AP 0.542.2Detic
2D ClassificationOpenImages-v4mask AP5042.2Detic
2D Object DetectionArtaxor mAP12Detic-FT
2D Object DetectionArtaxor mAP0.6Detic(w/o FT)
2D Object DetectionNEU-DETmAP16.8Detic-FT
2D Object DetectionDIORmAP15.4Detic-FT
2D Object DetectionDIORmAP0.1Detic(w/o FT)
2D Object DetectionClipark1k mAP22.3Detic-FT
2D Object DetectionClipark1k mAP11.4Detic(w/o FT)
2D Object DetectionDeepFishmAP17.9Detic-FT
2D Object DetectionDeepFishmAP0.9Detic(w/o FT)
2D Object DetectionUODDmAP16.8Detic-FT
2D Object DetectionLVIS v1.0AP novel-LVIS base training17.8Detic
2D Object DetectionMSCOCOAP 0.527.8Detic
2D Object DetectionOpenImages-v4AP 0.542.2Detic
2D Object DetectionOpenImages-v4mask AP5042.2Detic
Open Vocabulary Object DetectionLVIS v1.0AP novel-LVIS base training17.8Detic
Open Vocabulary Object DetectionMSCOCOAP 0.527.8Detic
Open Vocabulary Object DetectionOpenImages-v4AP 0.542.2Detic
Open Vocabulary Object DetectionOpenImages-v4mask AP5042.2Detic
16kArtaxor mAP12Detic-FT
16kArtaxor mAP0.6Detic(w/o FT)
16kNEU-DETmAP16.8Detic-FT
16kDIORmAP15.4Detic-FT
16kDIORmAP0.1Detic(w/o FT)
16kClipark1k mAP22.3Detic-FT
16kClipark1k mAP11.4Detic(w/o FT)
16kDeepFishmAP17.9Detic-FT
16kDeepFishmAP0.9Detic(w/o FT)
16kUODDmAP16.8Detic-FT
16kLVIS v1.0AP novel-LVIS base training17.8Detic
16kMSCOCOAP 0.527.8Detic
16kOpenImages-v4AP 0.542.2Detic
16kOpenImages-v4mask AP5042.2Detic

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise2025-07-13