The nocaps benchmark consists of 166,100 human-generated captions describing 15,100 images from the OpenImages validation and test sets.
Source: nocaps: novel object captioning at scale Image Source: https://nocaps.org/