TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/OVIC Datasets

OVIC Datasets

Open Vocabulary Image Classification Datasets

ImagesCC BY-NC-SA 4.0Introduced 2024-07-15

Due to the free-form nature of the open vocabulary image classification task, special annotations are required for image sets used for evaluation purposes. Three such image datasets are presented here:

  • World: 272 images of which the grand majority are originally sourced (have never been on the internet) from 10 countries by 12 people, with an active focus on covering as wide and varied concepts as possible, including unusual, deceptive and/or indirect representations of objects,
  • Wiki: 1000 Wikipedia lead images sampled from a scraped pool of 18K,
  • Val3K: 3000 images from the ImageNet-1K validation set, sampled uniformly across the classes.

It is not in general possible to exhaustively annotate ground truth classification labels for open vocabulary image sets, as this would require annotations for every possible correct object noun in the English language for every visible entity in every part of every image. It is possible however, to annotate the thousands of predictions that have been made across the image sets by open vocabulary models trained thus far. All three image datasets presented here have been individually annotated by both human and multimodal LLM annotators for the object nouns that were predicted by trained models. The annotations specify whether each classification is correct, close, or incorrect, and for the human annotations, whether it relates to a primary or secondary element of the image. It is customary to use the suffixes -H and -L to clearly specify which annotations are being referred to at any time, e.g. Wiki-H is the Wiki dataset with corresponding human annotations. All three datasets together contain a total of 17.4K human and 112K LLM class annotations.

The data is directly available at the following links:

  • World dataset
  • Wiki dataset
  • Val3K dataset

Refer to the NOVIC code for an example of how the datasets can be used, as well as tools for updating the class annotations for newer model predictions.

Related Benchmarks

OVIC Datasets (Val3K)/Zero-Shot Image Classification/Prediction Score (mean of 3)OVIC Datasets (Val3K)/Zero-Shot Image Classification/Top 1 Accuracy (mean of 3)OVIC Datasets (Wiki-H)/Zero-Shot Image Classification/Overall ScoreOVIC Datasets (Wiki-H)/Zero-Shot Image Classification/Prediction ScoreOVIC Datasets (Wiki-H)/Zero-Shot Image Classification/Prediction Score (mean of 3)OVIC Datasets (Wiki-H)/Zero-Shot Image Classification/Top 1 AccuracyOVIC Datasets (Wiki-L)/Zero-Shot Image Classification/Prediction Score (mean of 3)OVIC Datasets (World-H)/Zero-Shot Image Classification/Overall ScoreOVIC Datasets (World-H)/Zero-Shot Image Classification/Prediction ScoreOVIC Datasets (World-H)/Zero-Shot Image Classification/Prediction Score (mean of 3)OVIC Datasets (World-H)/Zero-Shot Image Classification/Top 1 Accuracy

Statistics

Papers
1
Benchmarks
0

Links

Homepage

Tasks

Open Vocabulary Image ClassificationOpen Vocabulary Object DetectionZero-Shot Image Classification