TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Supervised online diarization with sample mean loss for mu...

Supervised online diarization with sample mean loss for multi-domain data

Enrico Fini, Alessio Brutti

2019-11-04ClusteringSpeaker Diarization
PaperPDFCode(official)

Abstract

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model that significantly improve the learning efficiency and the overall diarization performance. In particular, we introduce a novel loss function, we called Sample Mean Loss and we present a better modelling of the speaker turn behaviour, by devising an analytical expression to compute the probability of a new speaker joining the conversation. In addition, we demonstrate that our model can be trained on fixed-length speech segments, removing the need for speaker change information in inference. Using x-vectors as input features, we evaluate our proposed approach on the multi-domain dataset employed in the DIHARD II challenge: our online method improves with respect to the original UIS-RNN and achieves similar performance to an offline agglomerative clustering baseline using PLDA scoring.

Results

TaskDatasetMetricValueModel
Speaker DiarizationDIHARD IIDER - no overlap19.4UIS-RNN-SML
Speaker DiarizationDIHARD IIDER(%)27.3UIS-RNN-SML

Related Papers

Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Ranking Vectors Clustering: Theory and Applications2025-07-16Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning2025-07-09Consistency and Inconsistency in $K$-Means Clustering2025-07-08MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations2025-07-03Supercm: Revisiting Clustering for Semi-Supervised Learning2025-06-30Temporal Rate Reduction Clustering for Human Motion Segmentation2025-06-26