Papers With Code 2 | ML Benchmarks, SotA Results & Code

A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.

The dataset contains data to train a Visual Voice Activity Detection(VVAD). The data comes in 4 different flavors:

faceImages: A series of images of faces with the corresponding label True for speaking and False for not speaking
lipImages: A series of images of lips with the corresponding label True for speaking and False for not speaking
faceFeatures: A series of feature maps extracted with dlibs face landmark detection of faces with the corresponding label True for speaking and False for not speaking
lipFeatures: A series of feature maps extracted with dlibs face landmark detection of lips with the corresponding label True for speaking and False for not speaking

VVAD-LRS3