TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchical Pronunciation Assessment with Multi-Aspect At...

Hierarchical Pronunciation Assessment with Multi-Aspect Attention

Heejin Do, Yunsu Kim, Gary Geunbae Lee

2022-11-15Utterance-level pronounciation scoringMulti-Task LearningWord-level pronunciation scoringPhone-level pronunciation scoring
PaperPDFCode(official)

Abstract

Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all aspects at all granularity levels; therefore, they have difficulty in capturing the linguistic hierarchy of phoneme, word, and utterance. This limitation further leads to neglecting intimate cross-aspect relations at the same linguistic unit. In this paper, we propose a Hierarchical Pronunciation Assessment with Multi-aspect Attention (HiPAMA) model, which hierarchically represents the granularity levels to directly capture their linguistic structures and introduces multi-aspect attention that reflects associations across aspects at the same level to create more connotative representations. By obtaining relational information from both the granularity- and aspect-side, HiPAMA can take full advantage of multi-task learning. Remarkable improvements in the experimental results on the speachocean762 datasets demonstrate the robustness of HiPAMA, particularly in the difficult-to-assess aspects.

Results

TaskDatasetMetricValueModel
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.62HiPAMA-Librispeech
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.59HiPAMA-Librispeech
Pronunciation Assessmentspeechocean762Pearson correlation coefficient (PCC)0.754HiPAMA-Librispeech

Related Papers

SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Robust-Multi-Task Gradient Boosting2025-07-15SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation2025-07-10Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration2025-06-25AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation2025-06-24An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify2025-06-23SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning2025-06-18Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment2025-06-17