Speech Recognition with Deep Recurrent Neural Networks

Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton

2013-03-22Speech Recognition Handwriting Recognition Phoneme Recognition

Abstract

Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	TIMIT	Percentage error	17.7	Bi-LSTM + skip connections w/ CTC

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14 VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08 A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06 First Steps Towards Voice Anonymization for Code-Switching Speech2025-07-02 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01 AUTOMATIC PRONUNCIATION MISTAKE DETECTOR PROJECT REPORT2025-06-25