TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ImportantAug: a data augmentation agent for speech

ImportantAug: a data augmentation agent for speech

Viet Anh Trinh, Hassan Salami Kavaki, Michael I Mandel

2021-12-14ICASSP 2022 4Speech RecognitionKeyword SpottingData Augmentation
PaperPDFCode(official)

Abstract

We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. Importance is predicted for each utterance by a data augmentation agent that is trained to maximize the amount of noise it adds while minimizing its impact on recognition performance. The effectiveness of our method is illustrated on version two of the Google Speech Commands (GSC) dataset. On the standard GSC test set, it achieves a 23.3% relative error rate reduction compared to conventional noise augmentation which applies noise to speech without regard to where it might be most effective. It also provides a 25.4% error rate reduction compared to a baseline without data augmentation. Additionally, the proposed ImportantAug outperforms the conventional noise augmentation and the baseline on two test sets with additional noise added.

Results

TaskDatasetMetricValueModel
Speech RecognitionGoogle Speech Commands - MusanError rate - SNR 0dB13.3ImportantAug
Keyword SpottingGoogle Speech CommandsGoogle Speech Command-Musan86.7ImportantAug
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 3595ImportantAug

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14