TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal Convolution for Real-time Keyword Spotting on Mob...

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha

2019-04-08Keyword Spotting
PaperPDFCodeCode(official)Code

Abstract

Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than \textbf{385x} speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.

Results

TaskDatasetMetricValueModel
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 1296.6TC-ResNet14-1.5

Related Papers

Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models2025-06-21Low-resource keyword spotting using contrastively trained transformer acoustic word embeddings2025-06-21ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors2025-06-17GLAP: General contrastive audio-text pretraining across domains and languages2025-06-12Advances in Small-Footprint Keyword Spotting: A Comprehensive Review of Efficient Models and Algorithms2025-06-12SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU2025-06-10Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting2025-06-06