Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

Chia-Hsuan Lee, Yun-Nung Chen, Hung-Yi Lee

2019-04-16Speech Recognition Automatic Speech Recognition Question Answering Automatic Speech Recognition (ASR)speech-recognition Spoken Language Understanding Domain Adaptation

Paper PDF Code Code

Abstract

Spoken question answering (SQA) is challenging due to complex reasoning on top of the spoken documents. The recent studies have also shown the catastrophic impact of automatic speech recognition (ASR) errors on SQA. Therefore, this work proposes to mitigate the ASR errors by aligning the mismatch between ASR hypotheses and their corresponding reference transcriptions. An adversarial model is applied to this domain adaptation task, which forces the model to learn domain-invariant features the QA model can effectively utilize in order to improve the SQA results. The experiments successfully demonstrate the effectiveness of our proposed model, and the results are better than the previous best model by 2% EM score.

Results

Task	Dataset	Metric	Value	Model
Dialogue	Spoken-SQuAD	F1 score	63.11	QANet + GAN
Spoken Language Understanding	Spoken-SQuAD	F1 score	63.11	QANet + GAN
Dialogue Understanding	Spoken-SQuAD	F1 score	63.11	QANet + GAN

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16