TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Open-vocabulary Attribute Detection

Open-vocabulary Attribute Detection

María A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox

2022-11-23CVPR 2023 1Open Vocabulary Attribute DetectionAttributeOpen Vocabulary Object DetectionLanguage Modelling
PaperPDFCode(official)

Abstract

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute detection performance of several foundation models. Project page https://ovad-benchmark.github.io

Results

TaskDatasetMetricValueModel
Object DetectionMSCOCOAP 0.530OVAD-Baseline
Object DetectionOVAD-Box benchmarkmean average precision21.4OVAD-Baseline-Box
Object DetectionOVAD benchmarkmean average precision18.8OVAD-Baseline (ResNet50)
3DMSCOCOAP 0.530OVAD-Baseline
3DOVAD-Box benchmarkmean average precision21.4OVAD-Baseline-Box
3DOVAD benchmarkmean average precision18.8OVAD-Baseline (ResNet50)
2D ClassificationMSCOCOAP 0.530OVAD-Baseline
2D ClassificationOVAD-Box benchmarkmean average precision21.4OVAD-Baseline-Box
2D ClassificationOVAD benchmarkmean average precision18.8OVAD-Baseline (ResNet50)
2D Object DetectionMSCOCOAP 0.530OVAD-Baseline
2D Object DetectionOVAD-Box benchmarkmean average precision21.4OVAD-Baseline-Box
2D Object DetectionOVAD benchmarkmean average precision18.8OVAD-Baseline (ResNet50)
Open Vocabulary Object DetectionMSCOCOAP 0.530OVAD-Baseline
Open Vocabulary Object DetectionOVAD-Box benchmarkmean average precision21.4OVAD-Baseline-Box
Open Vocabulary Object DetectionOVAD benchmarkmean average precision18.8OVAD-Baseline (ResNet50)
16kMSCOCOAP 0.530OVAD-Baseline
16kOVAD-Box benchmarkmean average precision21.4OVAD-Baseline-Box
16kOVAD benchmarkmean average precision18.8OVAD-Baseline (ResNet50)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16Assay2Mol: large language model-based drug design using BioAssay context2025-07-16