TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mitigating the Impact of Speech Recognition Errors on Spok...

Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

Chia-Hsuan Lee, Yun-Nung Chen, Hung-Yi Lee

2019-04-16Speech RecognitionAutomatic Speech RecognitionQuestion AnsweringAutomatic Speech Recognition (ASR)speech-recognitionSpoken Language UnderstandingDomain Adaptation
PaperPDFCodeCode

Abstract

Spoken question answering (SQA) is challenging due to complex reasoning on top of the spoken documents. The recent studies have also shown the catastrophic impact of automatic speech recognition (ASR) errors on SQA. Therefore, this work proposes to mitigate the ASR errors by aligning the mismatch between ASR hypotheses and their corresponding reference transcriptions. An adversarial model is applied to this domain adaptation task, which forces the model to learn domain-invariant features the QA model can effectively utilize in order to improve the SQA results. The experiments successfully demonstrate the effectiveness of our proposed model, and the results are better than the previous best model by 2% EM score.

Results

TaskDatasetMetricValueModel
DialogueSpoken-SQuADF1 score63.11QANet + GAN
Spoken Language UnderstandingSpoken-SQuADF1 score63.11QANet + GAN
Dialogue UnderstandingSpoken-SQuADF1 score63.11QANet + GAN

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16