ARVSU

Addressee Recognition in Visual Scenes with Utterances

AudioImagesIntroduced 2018-09-12

ARVSU contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario.

Source: Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances