MAVS

Multilingual Audio-Visual Smartphone dataset

SpeechVideosIntroduced 2021-09-09

MAVS is an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems.