VD-Ref
ImagesTextsApache-2.0 licenseIntroduced 2022-10-23
VD-Ref is a dataset with ground-truth mappings from both noun phrases and pronouns to image regions. This dataset contains a set of 10k complete sets from the VisDialog dataset, and uses the StanfordCoreNLP tool to tokenize the sentences, making it proper for the succeeding human annotation.
Source: Extending Phrase Grounding with Pronouns in Visual Dialogues
Image Source: https://arxiv.org/pdf/2210.12658v1.pdf