TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Whisper v2

Whisper v2

Reported on 29 benchmarks across 1 task · 2 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio48 results

  • Speech RecognitiononJam-ALT
    Case-Sensitive Word Error Rate· 2024-07-30
    42.1
    best: 20.1 (AudioShake v3)
    SOTA
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT French
    Case-Sensitive Word Error Rate· 2024-07-30
    31.1
    best: 23.5 (AudioShake v3)
    SOTA
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Case-Sensitive Word Error Rate· 2024-07-30
    31.5
    best: 17.7 (AudioShake v3)
    SOTA
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT German
    Case-Sensitive Word Error Rate· 2024-07-30
    59.3
    best: 17.5 (AudioShake v3)
    SOTA
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT English
    Case-Sensitive Word Error Rate· 2024-07-30
    47.5
    best: 20.9 (AudioShake v3)
    SOTA
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Case Error Rate· 2023-11-23
    4.5
    best: 3.4 (AudioShake v1)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT
    Word Error Rate (WER)· 2023-11-23
    35.7
    best: 16.1 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Word Error Rate (WER)· 2023-11-23
    27.7
    best: 20.8 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Case Error Rate· 2023-11-23
    3.2
    best: 2 (AudioShake v1)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Punctuation F-1· 2023-11-23
    45.8
    best: 46.1 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT Spanish
    Case Error Rate· 2023-11-23
    6.5
    best: 3.6 (Whisper v3 +demucs)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT Spanish
    Punctuation F-1· 2023-11-23
    50
    best: 56.7 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT Spanish
    Word Error Rate (WER)· 2023-11-23
    25.7
    best: 12.6 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT German
    Case Error Rate· 2023-11-23
    5.3
    best: 4 (Whisper v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT German
    Word Error Rate (WER)· 2023-11-23
    45.4
    best: 12.6 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Word Error Rate (WER)· 2023-11-23
    43.8
    best: 17.3 (AudioShake v3)
    SOTA
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT
    Line break F1· 2024-07-30
    69.3
    best: 84.4 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Section break F1· 2024-07-30
    3.3
    best: 73.9 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Line break F1· 2024-07-30
    69.3
    best: 84.4 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Punctuation F1· 2024-07-30
    44.2
    best: 57 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Section break F1· 2024-07-30
    3.3
    best: 73.9 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Word Error Rate (WER)· 2024-07-30
    37.8
    best: 16.1 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT French
    Punctuation F-1· 2024-07-30
    45.9
    best: 46.1 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Line break F-1· 2024-07-30
    71.7
    best: 82.7 (AudioShake v1)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Section break F-1· 2024-07-30
    3.1
    best: 69.6 (AudioShake v1)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Line break F-1· 2024-07-30
    71.7
    best: 82.7 (AudioShake v1)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Punctuation F-1· 2024-07-30
    52.8
    best: 56.7 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Section break F-1· 2024-07-30
    3.1
    best: 69.6 (AudioShake v1)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT Spanish
    Word Error Rate (WER)· 2024-07-30
    25.8
    best: 12.6 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT German
    Line break F-1· 2024-07-30
    70
    best: 83.7 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT German
    Punctuation F-1· 2024-07-30
    47.1
    best: 57.1 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT German
    Word Error Rate (WER)· 2024-07-30
    54.5
    best: 12.6 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT English
    Punctuation F-1· 2024-07-30
    31.5
    best: 65.3 (AudioShake v3)
    Lyrics Transcription for Humans: A Readability-Aware BenchmarkarXiv:2408.06370
  • Speech RecognitiononJam-ALT
    Punctuation F1· 2023-11-23
    41.7
    best: 57 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Line break F-1· 2023-11-23
    73.4
    best: 88.6 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Section break F-1· 2023-11-23
    1.4
    best: 72.5 (AudioShake v1)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Line break F-1· 2023-11-23
    73.4
    best: 88.6 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Section break F-1· 2023-11-23
    1.4
    best: 72.5 (AudioShake v1)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT French
    Word Error Rate (WER)· 2023-11-23
    27.7
    best: 20.8 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT German
    Line break F-1· 2023-11-23
    69.9
    best: 83.7 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT German
    Punctuation F-1· 2023-11-23
    38.7
    best: 57.1 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Line break F-1· 2023-11-23
    63
    best: 84.3 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Section break F-1· 2023-11-23
    11.2
    best: 84.8 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Case Error Rate· 2023-11-23
    3.5
    best: 3.4 (AudioShake v1)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Line break F-1· 2023-11-23
    63
    best: 84.3 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Punctuation F-1· 2023-11-23
    31.3
    best: 65.3 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Section break F-1· 2023-11-23
    11.2
    best: 84.8 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987
  • Speech RecognitiononJam-ALT English
    Word Error Rate (WER)· 2023-11-23
    43.8
    best: 17.3 (AudioShake v3)
    Jam-ALT: A Formatting-Aware Lyrics Transcription BenchmarkarXiv:2311.13987