TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/pyannote.audio: neural building blocks for speaker diariza...

pyannote.audio: neural building blocks for speaker diarization

Hervé Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, Marie-Philippe Gill

2019-11-04Action DetectionActivity DetectionChange DetectionSpeaker DiarizationBIG-bench Machine Learning
PaperPDFCode(official)CodeCode

Abstract

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.

Results

TaskDatasetMetricValueModel
Speaker DiarizationETAPEDER(%)4.9pyannote (waveform)
Speaker DiarizationETAPEFA4.2pyannote (waveform)
Speaker DiarizationETAPEMiss0.7pyannote (waveform)
Speaker DiarizationETAPEDER(%)5.6pyannote (MFCC)
Speaker DiarizationETAPEFA5.2pyannote (MFCC)
Speaker DiarizationETAPEMiss0.4pyannote (MFCC)
Speaker DiarizationETAPEDER(%)7.7Baseline
Speaker DiarizationETAPEFA7.5Baseline
Speaker DiarizationETAPEMiss0.2Baseline
Speaker DiarizationDIHARDDER(%)9.9pyannote (waveform)
Speaker DiarizationDIHARDFA5.7pyannote (waveform)
Speaker DiarizationDIHARDMiss4.2pyannote (waveform)
Speaker DiarizationDIHARDDER(%)10.5pyannote (MFCC)
Speaker DiarizationDIHARDFA6.8pyannote (MFCC)
Speaker DiarizationDIHARDMiss3.7pyannote (MFCC)
Speaker DiarizationDIHARDDER(%)11.2Baseline (the best result in the literature as of Oct.2019)
Speaker DiarizationDIHARDFA6.5Baseline (the best result in the literature as of Oct.2019)
Speaker DiarizationDIHARDMiss4.7Baseline (the best result in the literature as of Oct.2019)
Speaker DiarizationAMIDER(%)6pyannote (waveform)
Speaker DiarizationAMIFA3.6pyannote (waveform)
Speaker DiarizationAMIMiss2.4pyannote (waveform)
Speaker DiarizationAMIDER(%)6.3pyannote (MFCC)
Speaker DiarizationAMIFA3.5pyannote (MFCC)
Speaker DiarizationAMIMiss2.7pyannote (MFCC)
Multi-Label ClassificationCheXpertNUM RADS BELOW CURVE0.2Baseline

Related Papers

Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection2025-07-15Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices2025-07-04Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices2025-07-04Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection2025-06-26CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models2025-06-23