The IBM 2016 English Conversational Telephone Speech Recognition System

George Saon, Tom Sercu, Steven Rennie, Hong-Kwang J. Kuo

2016-04-27Speech Recognition speech-recognition Language Modelling

Abstract

We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bidirectional long short-term memory nets which operate on FMLLR and i-vector features. On the language modeling side, we use an updated model "M" and hierarchical neural network LMs.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	swb_hub_500 WER fullSWBCH	Percentage error	12.2	RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model
Speech Recognition	Switchboard + Hub500	Percentage error	6.6	RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model
Speech Recognition	Switchboard + Hub500	Percentage error	6.9	IBM 2016

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Assay2Mol: large language model-based drug design using BioAssay context2025-07-16