NOVIC Caption-Object Data

TextsCC BY-NC-SA 4.0Introduced 2024-07-15

This corpus contains data files that were generated as part of the NOVIC paper (see above). This includes the complete Object Noun Dictionary, the exact templates used for the multiset prompt templating strategy, and a large dataset of 1.8M LLM-generated and templated captions assorted by target noun. The captions were generated based on all of the target nouns in the Object Noun Dictionary.

The data is directly available at the following links:

Refer to the NOVIC code and Object Noun Dictionary code for examples of how the data can be used, as well as regenerated.