TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Speech Model Pre-training for End-to-End Spoken Language U...

Speech Model Pre-training for End-to-End Spoken Language Understanding

Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

2019-04-07Speech-to-TextSpoken Language Understanding
PaperPDFCodeCode

Abstract

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is first pre-trained to predict words and phonemes, thus learning good features for SLU. We introduce a new SLU dataset, Fluent Speech Commands, and show that our method improves performance both when the full dataset is used for training and when only a small subset is used. We also describe preliminary experiments to gauge the model's ability to generalize to new phrases not heard during training.

Results

TaskDatasetMetricValueModel
DialogueFluent Speech CommandsAccuracy (%)98.8Pooling classifier pre-trained using force-aligned phoneme and word labels on LibriSpeech
Spoken Language UnderstandingFluent Speech CommandsAccuracy (%)98.8Pooling classifier pre-trained using force-aligned phoneme and word labels on LibriSpeech
Dialogue UnderstandingFluent Speech CommandsAccuracy (%)98.8Pooling classifier pre-trained using force-aligned phoneme and word labels on LibriSpeech

Related Papers

An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments2025-07-14LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization2025-06-20End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data2025-06-19I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs2025-06-17S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation2025-06-11Advancing STT for Low-Resource Real-World Speech2025-06-10MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark2025-06-05Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios2025-05-30