Open-vocabulary Attribute Detection

María A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox

2022-11-23CVPR 2023 1Open Vocabulary Attribute Detection Attribute Open Vocabulary Object Detection Language Modelling

Abstract

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute detection performance of several foundation models. Project page https://ovad-benchmark.github.io

Results

Task	Dataset	Metric	Value	Model
Object Detection	MSCOCO	AP 0.5	30	OVAD-Baseline
Object Detection	OVAD-Box benchmark	mean average precision	21.4	OVAD-Baseline-Box
Object Detection	OVAD benchmark	mean average precision	18.8	OVAD-Baseline (ResNet50)
3D	MSCOCO	AP 0.5	30	OVAD-Baseline
3D	OVAD-Box benchmark	mean average precision	21.4	OVAD-Baseline-Box
3D	OVAD benchmark	mean average precision	18.8	OVAD-Baseline (ResNet50)
2D Classification	MSCOCO	AP 0.5	30	OVAD-Baseline
2D Classification	OVAD-Box benchmark	mean average precision	21.4	OVAD-Baseline-Box
2D Classification	OVAD benchmark	mean average precision	18.8	OVAD-Baseline (ResNet50)
2D Object Detection	MSCOCO	AP 0.5	30	OVAD-Baseline
2D Object Detection	OVAD-Box benchmark	mean average precision	21.4	OVAD-Baseline-Box
2D Object Detection	OVAD benchmark	mean average precision	18.8	OVAD-Baseline (ResNet50)
Open Vocabulary Object Detection	MSCOCO	AP 0.5	30	OVAD-Baseline
Open Vocabulary Object Detection	OVAD-Box benchmark	mean average precision	21.4	OVAD-Baseline-Box
Open Vocabulary Object Detection	OVAD benchmark	mean average precision	18.8	OVAD-Baseline (ResNet50)
16k	MSCOCO	AP 0.5	30	OVAD-Baseline
16k	OVAD-Box benchmark	mean average precision	21.4	OVAD-Baseline-Box
16k	OVAD benchmark	mean average precision	18.8	OVAD-Baseline (ResNet50)

Open-vocabulary Attribute Detection

Abstract

Results

Related Papers

Open-vocabulary Attribute Detection

Abstract

Results

Related Papers