MUSAN

AudioMusicSpeechCC BY 4.0Introduced 2015-10-28

MUSAN is a corpus of music, speech and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises.