VVAD-LRS3

ImagesLGPLv2Introduced 2021-09-28

A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.

The dataset contains data to train a Visual Voice Activity Detection(VVAD). The data comes in 4 different flavors:

  • faceImages: A series of images of faces with the corresponding label True for speaking and False for not speaking
  • lipImages: A series of images of lips with the corresponding label True for speaking and False for not speaking
  • faceFeatures: A series of feature maps extracted with dlibs face landmark detection of faces with the corresponding label True for speaking and False for not speaking
  • lipFeatures: A series of feature maps extracted with dlibs face landmark detection of lips with the corresponding label True for speaking and False for not speaking

Image source: https://arxiv.org/pdf/2109.13789v1.pdf