Papers With Code 2 | ML Benchmarks, SotA Results & Code

InaGVAD is a Voice Activity Detection (VAD) and Speaker Gender Segmentation (SGS) dataset designed for representing the acoustic diversity of French TV and Radio programs. InaGVAD detailed description, together with a benchmark of 6 freely available VAD systems and 3 SGS systems, is provided in a paper presented in LREC-COLING 2024.

InaGVAD contains 277 1-minute-long annotated recordings, partitioned into a 1h development and 3h37 test subset, allowing fair and reproducible system evaluation. Evaluation scripts provided with the corpus provide performance estimates in the same conditions as the 6 VAD and 3 SGS systems presented in the associated paper. Recordings were collected from 10 French radio and 18 TV channels categorized into 4 groups associated to diverse acoustic conditions : generalist radio, music radio, news TV, and generalist TV.

InaGVAD provides an extended VAD and SGS annotation scheme, allowing to describe systems diverse abilities based on :

Speaker Traits categories ** 3 Genders : Female, Male, I Don't Know (IDK) ** 3 Age groups : Young (prepubescent), Adult, Ederly (Senior) ** 3 Speech Qualities : standard, interjections (ah, oh, eg, aie), atypical (crying, laughing or shouted speech, ill person voice, artificially distorted voices, vocal performance, monster voice...) *10 Non-Speech event categories : Applause, environmental noise, hubbub, jingle, foreground music, background music, respiration, non-intelligible laughers, other, empty

The entire inaGVAD package; including corpus, annotations, evaluation scripts, and baseline training code; is made freely accessible, fostering future advancement in the domain.