TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A neural attention model for speech command recognition

A neural attention model for speech command recognition

Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener Da Silva Viana, Christoph Bernkopf

2018-08-27Image Captioning
PaperPDFCodeCodeCode(official)CodeCodeCodeCodeCode

Abstract

This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters. Results are compared with previous convolutional implementations on 5 different tasks (20 commands recognition (V1 and V2), 12 commands recognition (V1), 35 word recognition (V1) and left-right (V1)). We show detailed performance results and demonstrate that the proposed attention mechanism not only improves performance but also allows inspecting what regions of the audio were taken into consideration by the network when outputting a given category.

Results

TaskDatasetMetricValueModel
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V1 1295.6Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V1 299.2Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V1 2094.1Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V1 3594.3Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 1296.9Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 299.4Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 2094.5Attention RNN
Keyword SpottingGoogle Speech CommandsGoogle Speech Commands V2 3593.9Attention RNN

Related Papers

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval2025-06-28HalLoc: Token-level Localization of Hallucinations for Vision Language Models2025-06-12ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs2025-06-11A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning2025-06-11Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning2025-06-11Edit Flows: Flow Matching with Edit Operations2025-06-10Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings2025-06-10