VD-Ref

ImagesTextsApache-2.0 licenseIntroduced 2022-10-23

VD-Ref is a dataset with ground-truth mappings from both noun phrases and pronouns to image regions. This dataset contains a set of 10k complete sets from the VisDialog dataset, and uses the StanfordCoreNLP tool to tokenize the sentences, making it proper for the succeeding human annotation.