TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Transformer-Based Multi-Aspect Multi-Granularity Non-Nativ...

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass

2022-05-06Speech RecognitionAutomatic Speech RecognitionUtterance-level pronounciation scoringAutomatic Speech Recognition (ASR)speech-recognitionMulti-Task LearningWord-level pronunciation scoringPhone-level pronunciation scoring
PaperPDFCode(official)

Abstract

Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.

Results

TaskDatasetMetricValueModel
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.68GOPT-PAII
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.61GOPT-Librispeech
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.6GOPT-PAII
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.55GOPT-Librispeech
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.74GOPT-Librispeech
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.73GOPT-PAII

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Robust-Multi-Task Gradient Boosting2025-07-15WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation2025-07-10VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06