ARVSU
Addressee Recognition in Visual Scenes with Utterances
AudioImagesIntroduced 2018-09-12
ARVSU contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario.
Source: Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances