TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Jasper: An End-to-End Convolutional Neural Acoustic Model

Jasper: An End-to-End Convolutional Neural Acoustic Model

Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

2019-04-05Speech RecognitionLanguage Modelling
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep architecture performs as well or better than more complex choices. Our deepest Jasper variant uses 54 convolutional layers. With this architecture, we achieve 2.95% WER using a beam-search decoder with an external neural language model and 3.86% WER with a greedy decoder on LibriSpeech test-clean. We also report competitive results on the Wall Street Journal and the Hub5'00 conversational evaluation datasets.

Results

TaskDatasetMetricValueModel
Speech RecognitionWSJ eval92Word Error Rate (WER)6.9Jasper 10x3
Speech RecognitionHub5'00 SwitchBoardCallHome16.2Jasper DR 10x5
Speech RecognitionHub5'00 SwitchBoardSwitchBoard7.8Jasper DR 10x5
Speech RecognitionLibriSpeech test-cleanWord Error Rate (WER)2.84Jasper DR 10x5 (+ Time/Freq Masks)
Speech RecognitionLibriSpeech test-cleanWord Error Rate (WER)2.95Jasper DR 10x5
Speech RecognitionLibriSpeech test-otherWord Error Rate (WER)7.84Jasper DR 10x5 (+ Time/Freq Masks)
Speech RecognitionLibriSpeech test-otherWord Error Rate (WER)8.79Jasper DR 10x5

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16