SG-VAD: Stochastic Gates Based Speech Activity Detection

Jonathan Svirsky, Ofir Lindenbaum

2022-10-28Action Detection Denoising Activity Detection

Abstract

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

Results

Task	Dataset	Metric	Value	Model
Activity Detection	AVA-Speech	ROC-AUC	94.3	SG-VAD (ours)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15 A statistical physics framework for optimal learning2025-07-10 LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08 Unconditional Diffusion for Generative Sequential Recommendation2025-07-08