Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Whisper v3 +demucs

Whisper v3 +demucs

Reported on 25 benchmarks across 1 task · 2 papers · 14 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio40 results

Speech RecognitiononJam-ALT
Case-Sensitive Word Error Rate· 2024-07-30
51.6
best: 20.1 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT French
Case-Sensitive Word Error Rate· 2024-07-30
48.2
best: 23.5 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Case-Sensitive Word Error Rate· 2024-07-30
64.9
best: 17.7 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Case-Sensitive Word Error Rate· 2024-07-30
47.4
best: 17.5 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT English
Case-Sensitive Word Error Rate· 2024-07-30
47.2
best: 20.9 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Case Error Rate· 2023-11-23
3.8
best: 3.4 (AudioShake v1)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT
Word Error Rate (WER)· 2023-11-23
47.9
best: 16.1 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Word Error Rate (WER)· 2023-11-23
44.9
best: 20.8 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Word Error Rate (WER)· 2023-11-23
61.5
best: 12.6 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Case Error Rate· 2023-11-23
3.6
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Word Error Rate (WER)· 2023-11-23
43.5
best: 12.6 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Case Error Rate· 2023-11-23
4.4
best: 4 (Whisper v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Word Error Rate (WER)· 2023-11-23
43
best: 17.3 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Case Error Rate· 2023-11-23
4.1
best: 3.4 (AudioShake v1)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT
Line break F1· 2024-07-30
65.7
best: 84.4 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Line break F1· 2024-07-30
65.7
best: 84.4 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Punctuation F1· 2024-07-30
33
best: 57 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Word Error Rate (WER)· 2024-07-30
48
best: 16.1 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT French
Line break F-1· 2024-07-30
69.3
best: 88.6 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT French
Punctuation F-1· 2024-07-30
32
best: 46.1 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Line break F-1· 2024-07-30
52.3
best: 82.7 (AudioShake v1)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Punctuation F-1· 2024-07-30
32.4
best: 56.7 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Line break F-1· 2024-07-30
71.9
best: 83.7 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Punctuation F-1· 2024-07-30
45.4
best: 57.1 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT English
Line break F-1· 2024-07-30
66.9
best: 84.3 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT English
Punctuation F-1· 2024-07-30
25.8
best: 65.3 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Punctuation F1· 2023-11-23
29
best: 57 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Case Error Rate· 2023-11-23
3.2
best: 2 (AudioShake v1)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Line break F-1· 2023-11-23
69.4
best: 88.6 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Punctuation F-1· 2023-11-23
30.9
best: 46.1 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Word Error Rate (WER)· 2023-11-23
44.9
best: 20.8 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Line break F-1· 2023-11-23
52.4
best: 82.7 (AudioShake v1)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Punctuation F-1· 2023-11-23
28.7
best: 56.7 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Word Error Rate (WER)· 2023-11-23
61.5
best: 12.6 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Line break F-1· 2023-11-23
72
best: 83.7 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Punctuation F-1· 2023-11-23
34
best: 57.1 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Word Error Rate (WER)· 2023-11-23
43.5
best: 12.6 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Line break F-1· 2023-11-23
66.8
best: 84.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Punctuation F-1· 2023-11-23
23.3
best: 65.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Word Error Rate (WER)· 2023-11-23
43
best: 17.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987