SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le

2019-04-18Speech Recognition Automatic Speech Recognition Automatic Speech Recognition (ASR)Data Augmentation Language Modelling

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	Hub5'00 SwitchBoard	CallHome	14.6	LAS + SpecAugment (with LM, Switchboard mild policy)
Speech Recognition	Hub5'00 SwitchBoard	SwitchBoard	6.8	LAS + SpecAugment (with LM, Switchboard mild policy)
Speech Recognition	Hub5'00 SwitchBoard	CallHome	14	LAS + SpecAugment (with LM, Switchboard strong policy)
Speech Recognition	Hub5'00 SwitchBoard	SwitchBoard	7.1	LAS + SpecAugment (with LM, Switchboard strong policy)
Speech Recognition	LibriSpeech test-clean	Word Error Rate (WER)	2.5	LAS + SpecAugment
Speech Recognition	LibriSpeech test-clean	Word Error Rate (WER)	2.7	LAS (no LM)
Speech Recognition	LibriSpeech test-other	Word Error Rate (WER)	5.8	LAS + SpecAugment
Speech Recognition	LibriSpeech test-other	Word Error Rate (WER)	6.5	LAS (no LM)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Abstract

Results

Related Papers

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Abstract

Results

Related Papers