VVAD-LRS3
ImagesLGPLv2Introduced 2021-09-28
A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.
The dataset contains data to train a Visual Voice Activity Detection(VVAD). The data comes in 4 different flavors:
- faceImages: A series of images of faces with the corresponding label True for speaking and False for not speaking
- lipImages: A series of images of lips with the corresponding label True for speaking and False for not speaking
- faceFeatures: A series of feature maps extracted with dlibs face landmark detection of faces with the corresponding label True for speaking and False for not speaking
- lipFeatures: A series of feature maps extracted with dlibs face landmark detection of lips with the corresponding label True for speaking and False for not speaking
Image source: https://arxiv.org/pdf/2109.13789v1.pdf