ImportantAug: a data augmentation agent for speech

Viet Anh Trinh, Hassan Salami Kavaki, Michael I Mandel

2021-12-14ICASSP 2022 4Speech Recognition Keyword Spotting Data Augmentation

Abstract

We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. Importance is predicted for each utterance by a data augmentation agent that is trained to maximize the amount of noise it adds while minimizing its impact on recognition performance. The effectiveness of our method is illustrated on version two of the Google Speech Commands (GSC) dataset. On the standard GSC test set, it achieves a 23.3% relative error rate reduction compared to conventional noise augmentation which applies noise to speech without regard to where it might be most effective. It also provides a 25.4% error rate reduction compared to a baseline without data augmentation. Additionally, the proposed ImportantAug outperforms the conventional noise augmentation and the baseline on two test sets with additional noise added.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	Google Speech Commands - Musan	Error rate - SNR 0dB	13.3	ImportantAug
Keyword Spotting	Google Speech Commands	Google Speech Command-Musan	86.7	ImportantAug
Keyword Spotting	Google Speech Commands	Google Speech Commands V2 35	95	ImportantAug

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15 WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14 Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14