Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Whisper v2

Whisper v2

Reported on 29 benchmarks across 1 task · 2 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio48 results

Speech RecognitiononJam-ALT
Case-Sensitive Word Error Rate· 2024-07-30
42.1
best: 20.1 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT French
Case-Sensitive Word Error Rate· 2024-07-30
31.1
best: 23.5 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Case-Sensitive Word Error Rate· 2024-07-30
31.5
best: 17.7 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Case-Sensitive Word Error Rate· 2024-07-30
59.3
best: 17.5 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT English
Case-Sensitive Word Error Rate· 2024-07-30
47.5
best: 20.9 (AudioShake v3)
SOTA
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Case Error Rate· 2023-11-23
4.5
best: 3.4 (AudioShake v1)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT
Word Error Rate (WER)· 2023-11-23
35.7
best: 16.1 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Word Error Rate (WER)· 2023-11-23
27.7
best: 20.8 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Case Error Rate· 2023-11-23
3.2
best: 2 (AudioShake v1)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Punctuation F-1· 2023-11-23
45.8
best: 46.1 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Case Error Rate· 2023-11-23
6.5
best: 3.6 (Whisper v3 +demucs)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Punctuation F-1· 2023-11-23
50
best: 56.7 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT Spanish
Word Error Rate (WER)· 2023-11-23
25.7
best: 12.6 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Case Error Rate· 2023-11-23
5.3
best: 4 (Whisper v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Word Error Rate (WER)· 2023-11-23
45.4
best: 12.6 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Word Error Rate (WER)· 2023-11-23
43.8
best: 17.3 (AudioShake v3)
SOTA
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT
Line break F1· 2024-07-30
69.3
best: 84.4 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Section break F1· 2024-07-30
3.3
best: 73.9 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Line break F1· 2024-07-30
69.3
best: 84.4 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Punctuation F1· 2024-07-30
44.2
best: 57 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Section break F1· 2024-07-30
3.3
best: 73.9 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Word Error Rate (WER)· 2024-07-30
37.8
best: 16.1 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT French
Punctuation F-1· 2024-07-30
45.9
best: 46.1 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Line break F-1· 2024-07-30
71.7
best: 82.7 (AudioShake v1)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Section break F-1· 2024-07-30
3.1
best: 69.6 (AudioShake v1)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Line break F-1· 2024-07-30
71.7
best: 82.7 (AudioShake v1)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Punctuation F-1· 2024-07-30
52.8
best: 56.7 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Section break F-1· 2024-07-30
3.1
best: 69.6 (AudioShake v1)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT Spanish
Word Error Rate (WER)· 2024-07-30
25.8
best: 12.6 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Line break F-1· 2024-07-30
70
best: 83.7 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Punctuation F-1· 2024-07-30
47.1
best: 57.1 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT German
Word Error Rate (WER)· 2024-07-30
54.5
best: 12.6 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT English
Punctuation F-1· 2024-07-30
31.5
best: 65.3 (AudioShake v3)
Lyrics Transcription for Humans: A Readability-Aware Benchmark arXiv:2408.06370
Speech RecognitiononJam-ALT
Punctuation F1· 2023-11-23
41.7
best: 57 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Line break F-1· 2023-11-23
73.4
best: 88.6 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Section break F-1· 2023-11-23
1.4
best: 72.5 (AudioShake v1)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Line break F-1· 2023-11-23
73.4
best: 88.6 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Section break F-1· 2023-11-23
1.4
best: 72.5 (AudioShake v1)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT French
Word Error Rate (WER)· 2023-11-23
27.7
best: 20.8 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Line break F-1· 2023-11-23
69.9
best: 83.7 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT German
Punctuation F-1· 2023-11-23
38.7
best: 57.1 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Line break F-1· 2023-11-23
63
best: 84.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Section break F-1· 2023-11-23
11.2
best: 84.8 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Case Error Rate· 2023-11-23
3.5
best: 3.4 (AudioShake v1)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Line break F-1· 2023-11-23
63
best: 84.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Punctuation F-1· 2023-11-23
31.3
best: 65.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Section break F-1· 2023-11-23
11.2
best: 84.8 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987
Speech RecognitiononJam-ALT English
Word Error Rate (WER)· 2023-11-23
43.8
best: 17.3 (AudioShake v3)
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark arXiv:2311.13987