TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Training Keyword Spotters with Limited and Synthesized Spe...

Training Keyword Spotters with Limited and Synthesized Speech Data

James Lin, Kevin Kilgour, Dominik Roblek, Matthew Sharifi

2020-01-31Keyword Spotting
PaperPDFCode

Abstract

With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. Instead of training such models directly on the audio or low level features such as MFCCs, we use a pre-trained speech embedding model trained to extract useful features for keyword spotting models. Using this speech embedding, we show that a model which detects 10 keywords when trained on only synthetic speech is equivalent to a model trained on over 500 real examples. We also show that a model without our speech embeddings would need to be trained on over 4000 real examples to reach the same accuracy.

Results

TaskDatasetMetricValueModel
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 1297.7Embedding + Head
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 1297.4Head without Embedding

Related Papers

Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models2025-06-21Low-resource keyword spotting using contrastively trained transformer acoustic word embeddings2025-06-21ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors2025-06-17GLAP: General contrastive audio-text pretraining across domains and languages2025-06-12Advances in Small-Footprint Keyword Spotting: A Comprehensive Review of Efficient Models and Algorithms2025-06-12SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU2025-06-10Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting2025-06-06