Broadcasted Residual Learning for Efficient Keyword Spotting

Byeonggeun Kim, Simyung Chang, Jinkyu Lee, Dooyong Sung

2021-06-08Keyword Spotting

Abstract

Keyword spotting is an important research field because it plays a key role in device wake-up and user interaction on smart devices. However, it is challenging to minimize errors while operating efficiently in devices with limited resources such as mobile phones. We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension. This residual mapping enables the network to effectively represent useful audio features with much less computation than conventional convolutional neural networks. We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning and describe how to scale up the model according to the target device's resources. BC-ResNets achieve state-of-the-art 98.0% and 98.7% top-1 accuracy on Google speech command datasets v1 and v2, respectively, and consistently outperform previous approaches, using fewer computations and parameters. Code is available at https://github.com/Qualcomm-AI-research/bcresnet.

Results

Task	Dataset	Metric	Value	Model
Keyword Spotting	Google Speech Commands	Google Speech Commands V1 12	98	BC-ResNet-8
Keyword Spotting	Google Speech Commands	Google Speech Commands V2 12	98.7	BC-ResNet-8

Related Papers

Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models2025-06-21 Low-resource keyword spotting using contrastively trained transformer acoustic word embeddings2025-06-21 ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors2025-06-17 GLAP: General contrastive audio-text pretraining across domains and languages2025-06-12 Advances in Small-Footprint Keyword Spotting: A Comprehensive Review of Efficient Models and Algorithms2025-06-12 SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10 Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU2025-06-10 Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting2025-06-06