Papers With Code 2 | ML Benchmarks, SotA Results & Code

This corpus contains data files that were generated as part of the NOVIC paper (see above). This includes the complete Object Noun Dictionary, the exact templates used for the multiset prompt templating strategy, and a large dataset of 1.8M LLM-generated and templated captions assorted by target noun. The captions were generated based on all of the target nouns in the Object Noun Dictionary.

The data is directly available at the following links:

Object Noun Dictionary (JSON)
Multiset prompt templates
LLM-generated captions dataset

Refer to the NOVIC code and Object Noun Dictionary code for examples of how the data can be used, as well as regenerated.

NOVIC Caption-Object Data