Detecting Twenty-thousand Classes using Image-level Supervision

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra

2022-01-07Image Classification Open Vocabulary Object Detection Cross-Domain Few-Shot Object Detection

Abstract

Current object detectors are limited in vocabulary size due to the small scale of detection datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as their datasets are larger and easier to collect. We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts. Unlike prior work, Detic does not need complex assignment schemes to assign image labels to boxes based on model predictions, making it much easier to implement and compatible with a range of detection architectures and backbones. Our results show that Detic yields excellent detectors even for classes without box annotations. It outperforms prior work on both open-vocabulary and long-tail detection benchmarks. Detic provides a gain of 2.4 mAP for all classes and 8.3 mAP for novel classes on the open-vocabulary LVIS benchmark. On the standard LVIS benchmark, Detic obtains 41.7 mAP when evaluated on all classes, or only rare classes, hence closing the gap in performance for object categories with few samples. For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning. Code is available at \url{https://github.com/facebookresearch/Detic}.

Results

Task	Dataset	Metric	Value	Model
Object Detection	Artaxor	mAP	12	Detic-FT
Object Detection	Artaxor	mAP	0.6	Detic(w/o FT)
Object Detection	NEU-DET	mAP	16.8	Detic-FT
Object Detection	DIOR	mAP	15.4	Detic-FT
Object Detection	DIOR	mAP	0.1	Detic(w/o FT)
Object Detection	Clipark1k	mAP	22.3	Detic-FT
Object Detection	Clipark1k	mAP	11.4	Detic(w/o FT)
Object Detection	DeepFish	mAP	17.9	Detic-FT
Object Detection	DeepFish	mAP	0.9	Detic(w/o FT)
Object Detection	UODD	mAP	16.8	Detic-FT
Object Detection	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
Object Detection	MSCOCO	AP 0.5	27.8	Detic
Object Detection	OpenImages-v4	AP 0.5	42.2	Detic
Object Detection	OpenImages-v4	mask AP50	42.2	Detic
3D	Artaxor	mAP	12	Detic-FT
3D	Artaxor	mAP	0.6	Detic(w/o FT)
3D	NEU-DET	mAP	16.8	Detic-FT
3D	DIOR	mAP	15.4	Detic-FT
3D	DIOR	mAP	0.1	Detic(w/o FT)
3D	Clipark1k	mAP	22.3	Detic-FT
3D	Clipark1k	mAP	11.4	Detic(w/o FT)
3D	DeepFish	mAP	17.9	Detic-FT
3D	DeepFish	mAP	0.9	Detic(w/o FT)
3D	UODD	mAP	16.8	Detic-FT
3D	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
3D	MSCOCO	AP 0.5	27.8	Detic
3D	OpenImages-v4	AP 0.5	42.2	Detic
3D	OpenImages-v4	mask AP50	42.2	Detic
Few-Shot Object Detection	Artaxor	mAP	12	Detic-FT
Few-Shot Object Detection	Artaxor	mAP	0.6	Detic(w/o FT)
Few-Shot Object Detection	NEU-DET	mAP	16.8	Detic-FT
Few-Shot Object Detection	DIOR	mAP	15.4	Detic-FT
Few-Shot Object Detection	DIOR	mAP	0.1	Detic(w/o FT)
Few-Shot Object Detection	Clipark1k	mAP	22.3	Detic-FT
Few-Shot Object Detection	Clipark1k	mAP	11.4	Detic(w/o FT)
Few-Shot Object Detection	DeepFish	mAP	17.9	Detic-FT
Few-Shot Object Detection	DeepFish	mAP	0.9	Detic(w/o FT)
Few-Shot Object Detection	UODD	mAP	16.8	Detic-FT
2D Classification	Artaxor	mAP	12	Detic-FT
2D Classification	Artaxor	mAP	0.6	Detic(w/o FT)
2D Classification	NEU-DET	mAP	16.8	Detic-FT
2D Classification	DIOR	mAP	15.4	Detic-FT
2D Classification	DIOR	mAP	0.1	Detic(w/o FT)
2D Classification	Clipark1k	mAP	22.3	Detic-FT
2D Classification	Clipark1k	mAP	11.4	Detic(w/o FT)
2D Classification	DeepFish	mAP	17.9	Detic-FT
2D Classification	DeepFish	mAP	0.9	Detic(w/o FT)
2D Classification	UODD	mAP	16.8	Detic-FT
2D Classification	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
2D Classification	MSCOCO	AP 0.5	27.8	Detic
2D Classification	OpenImages-v4	AP 0.5	42.2	Detic
2D Classification	OpenImages-v4	mask AP50	42.2	Detic
2D Object Detection	Artaxor	mAP	12	Detic-FT
2D Object Detection	Artaxor	mAP	0.6	Detic(w/o FT)
2D Object Detection	NEU-DET	mAP	16.8	Detic-FT
2D Object Detection	DIOR	mAP	15.4	Detic-FT
2D Object Detection	DIOR	mAP	0.1	Detic(w/o FT)
2D Object Detection	Clipark1k	mAP	22.3	Detic-FT
2D Object Detection	Clipark1k	mAP	11.4	Detic(w/o FT)
2D Object Detection	DeepFish	mAP	17.9	Detic-FT
2D Object Detection	DeepFish	mAP	0.9	Detic(w/o FT)
2D Object Detection	UODD	mAP	16.8	Detic-FT
2D Object Detection	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
2D Object Detection	MSCOCO	AP 0.5	27.8	Detic
2D Object Detection	OpenImages-v4	AP 0.5	42.2	Detic
2D Object Detection	OpenImages-v4	mask AP50	42.2	Detic
Open Vocabulary Object Detection	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
Open Vocabulary Object Detection	MSCOCO	AP 0.5	27.8	Detic
Open Vocabulary Object Detection	OpenImages-v4	AP 0.5	42.2	Detic
Open Vocabulary Object Detection	OpenImages-v4	mask AP50	42.2	Detic
16k	Artaxor	mAP	12	Detic-FT
16k	Artaxor	mAP	0.6	Detic(w/o FT)
16k	NEU-DET	mAP	16.8	Detic-FT
16k	DIOR	mAP	15.4	Detic-FT
16k	DIOR	mAP	0.1	Detic(w/o FT)
16k	Clipark1k	mAP	22.3	Detic-FT
16k	Clipark1k	mAP	11.4	Detic(w/o FT)
16k	DeepFish	mAP	17.9	Detic-FT
16k	DeepFish	mAP	0.9	Detic(w/o FT)
16k	UODD	mAP	16.8	Detic-FT
16k	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
16k	MSCOCO	AP 0.5	27.8	Detic
16k	OpenImages-v4	AP 0.5	42.2	Detic
16k	OpenImages-v4	mask AP50	42.2	Detic

Abstract

Results

Task	Dataset	Metric	Value	Model
Object Detection	Artaxor	mAP	12	Detic-FT
Object Detection	Artaxor	mAP	0.6	Detic(w/o FT)
Object Detection	NEU-DET	mAP	16.8	Detic-FT
Object Detection	DIOR	mAP	15.4	Detic-FT
Object Detection	DIOR	mAP	0.1	Detic(w/o FT)
Object Detection	Clipark1k	mAP	22.3	Detic-FT
Object Detection	Clipark1k	mAP	11.4	Detic(w/o FT)
Object Detection	DeepFish	mAP	17.9	Detic-FT
Object Detection	DeepFish	mAP	0.9	Detic(w/o FT)
Object Detection	UODD	mAP	16.8	Detic-FT
Object Detection	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
Object Detection	MSCOCO	AP 0.5	27.8	Detic
Object Detection	OpenImages-v4	AP 0.5	42.2	Detic
Object Detection	OpenImages-v4	mask AP50	42.2	Detic
3D	Artaxor	mAP	12	Detic-FT
3D	Artaxor	mAP	0.6	Detic(w/o FT)
3D	NEU-DET	mAP	16.8	Detic-FT
3D	DIOR	mAP	15.4	Detic-FT
3D	DIOR	mAP	0.1	Detic(w/o FT)
3D	Clipark1k	mAP	22.3	Detic-FT
3D	Clipark1k	mAP	11.4	Detic(w/o FT)
3D	DeepFish	mAP	17.9	Detic-FT
3D	DeepFish	mAP	0.9	Detic(w/o FT)
3D	UODD	mAP	16.8	Detic-FT
3D	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
3D	MSCOCO	AP 0.5	27.8	Detic
3D	OpenImages-v4	AP 0.5	42.2	Detic
3D	OpenImages-v4	mask AP50	42.2	Detic
Few-Shot Object Detection	Artaxor	mAP	12	Detic-FT
Few-Shot Object Detection	Artaxor	mAP	0.6	Detic(w/o FT)
Few-Shot Object Detection	NEU-DET	mAP	16.8	Detic-FT
Few-Shot Object Detection	DIOR	mAP	15.4	Detic-FT
Few-Shot Object Detection	DIOR	mAP	0.1	Detic(w/o FT)
Few-Shot Object Detection	Clipark1k	mAP	22.3	Detic-FT
Few-Shot Object Detection	Clipark1k	mAP	11.4	Detic(w/o FT)
Few-Shot Object Detection	DeepFish	mAP	17.9	Detic-FT
Few-Shot Object Detection	DeepFish	mAP	0.9	Detic(w/o FT)
Few-Shot Object Detection	UODD	mAP	16.8	Detic-FT
2D Classification	Artaxor	mAP	12	Detic-FT
2D Classification	Artaxor	mAP	0.6	Detic(w/o FT)
2D Classification	NEU-DET	mAP	16.8	Detic-FT
2D Classification	DIOR	mAP	15.4	Detic-FT
2D Classification	DIOR	mAP	0.1	Detic(w/o FT)
2D Classification	Clipark1k	mAP	22.3	Detic-FT
2D Classification	Clipark1k	mAP	11.4	Detic(w/o FT)
2D Classification	DeepFish	mAP	17.9	Detic-FT
2D Classification	DeepFish	mAP	0.9	Detic(w/o FT)
2D Classification	UODD	mAP	16.8	Detic-FT
2D Classification	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
2D Classification	MSCOCO	AP 0.5	27.8	Detic
2D Classification	OpenImages-v4	AP 0.5	42.2	Detic
2D Classification	OpenImages-v4	mask AP50	42.2	Detic
2D Object Detection	Artaxor	mAP	12	Detic-FT
2D Object Detection	Artaxor	mAP	0.6	Detic(w/o FT)
2D Object Detection	NEU-DET	mAP	16.8	Detic-FT
2D Object Detection	DIOR	mAP	15.4	Detic-FT
2D Object Detection	DIOR	mAP	0.1	Detic(w/o FT)
2D Object Detection	Clipark1k	mAP	22.3	Detic-FT
2D Object Detection	Clipark1k	mAP	11.4	Detic(w/o FT)
2D Object Detection	DeepFish	mAP	17.9	Detic-FT
2D Object Detection	DeepFish	mAP	0.9	Detic(w/o FT)
2D Object Detection	UODD	mAP	16.8	Detic-FT
2D Object Detection	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
2D Object Detection	MSCOCO	AP 0.5	27.8	Detic
2D Object Detection	OpenImages-v4	AP 0.5	42.2	Detic
2D Object Detection	OpenImages-v4	mask AP50	42.2	Detic
Open Vocabulary Object Detection	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
Open Vocabulary Object Detection	MSCOCO	AP 0.5	27.8	Detic
Open Vocabulary Object Detection	OpenImages-v4	AP 0.5	42.2	Detic
Open Vocabulary Object Detection	OpenImages-v4	mask AP50	42.2	Detic
16k	Artaxor	mAP	12	Detic-FT
16k	Artaxor	mAP	0.6	Detic(w/o FT)
16k	NEU-DET	mAP	16.8	Detic-FT
16k	DIOR	mAP	15.4	Detic-FT
16k	DIOR	mAP	0.1	Detic(w/o FT)
16k	Clipark1k	mAP	22.3	Detic-FT
16k	Clipark1k	mAP	11.4	Detic(w/o FT)
16k	DeepFish	mAP	17.9	Detic-FT
16k	DeepFish	mAP	0.9	Detic(w/o FT)
16k	UODD	mAP	16.8	Detic-FT
16k	LVIS v1.0	AP novel-LVIS base training	17.8	Detic
16k	MSCOCO	AP 0.5	27.8	Detic
16k	OpenImages-v4	AP 0.5	42.2	Detic
16k	OpenImages-v4	mask AP50	42.2	Detic

Detecting Twenty-thousand Classes using Image-level Supervision

Abstract

Results

Related Papers

Detecting Twenty-thousand Classes using Image-level Supervision

Abstract

Results

Related Papers