TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/End-to-End Neural Speaker Diarization with Permutation-Fre...

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe

2019-09-12ClusteringSpeaker DiarizationMulti-Label ClassificationDomain Adaptation
PaperPDFCode(official)

Abstract

In this paper, we propose a novel end-to-end neural-network-based speaker diarization method. Unlike most existing methods, our proposed method does not have separate modules for extraction and clustering of speaker representations. Instead, our model has a single neural network that directly outputs speaker diarization results. To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem. Besides its end-to-end simplicity, the proposed method also benefits from being able to explicitly handle overlapping speech during training and inference. Because of the benefit, our model can be easily trained/adapted with real-recorded multi-speaker conversations just by feeding the corresponding multi-speaker segment labels. We evaluated the proposed method on simulated speech mixtures. The proposed method achieved diarization error rate of 12.28%, while a conventional clustering-based system produced diarization error rate of 28.77%. Furthermore, the domain adaptation with real-recorded speech provided 25.6% relative improvement on the CALLHOME dataset. Our source code is available online at https://github.com/hitachi-speech/EEND.

Results

TaskDatasetMetricValueModel
Speaker DiarizationCALLHOMEDER(%)23.07EEND

Related Papers

Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Ranking Vectors Clustering: Theory and Applications2025-07-16Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation2025-07-14Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered Evaluation2025-07-11The Bayesian Approach to Continual Learning: An Overview2025-07-11Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection2025-07-10