TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Qwen-Audio

Qwen-Audio

Reported on 14 benchmarks across 6 tasks · 1 paper · 8 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio13 results

  • Speech RecognitiononAISHELL-2 Test Android
    Word Error Rate (WER)· uses extra data· 2023-11-14
    3.3
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Speech RecognitiononAISHELL-2 Test IOS
    Word Error Rate (WER)· uses extra data· 2023-11-14
    3.1
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Speech RecognitiononAISHELL-2 Test Mic
    Word Error Rate (WER)· uses extra data· 2023-11-14
    3.3
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Speech RecognitiononAISHELL-1
    Word Error Rate (WER)· uses extra data· 2023-11-14
    1.29
    best: 0.55 (FireRedASR-AED)
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Audio ClassificationonVocalSound
    Accuracy · uses extra data· 2023-11-14
    92.89
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Acoustic Scene ClassificationonTUT Acoustic Scenes 2017
    1:1 Accuracy· uses extra data· 2023-11-14
    0.649
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Acoustic Scene ClassificationonCochlScene
    1:1 Accuracy· uses extra data· 2023-11-14
    0.795
    best: 0.83 (Audio Flamingo)
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Speech RecognitiononLibriSpeech test-clean
    Word Error Rate (WER)· 2023-11-14
    2
    best: 0.985 (United Med ASR)
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Speech RecognitiononLibriSpeech test-other
    Word Error Rate (WER)· 2023-11-14
    4.2
    best: 2.48 (SAMBA ASR)
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Emotion RecognitiononMELD
    Accuracy· uses extra data· 2023-11-14
    55.7
    best: 68.7 (ELR-GNN)
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Audio captioningonClotho
    CIDEr· uses extra data· 2023-11-14
    0.441
    best: 14 (ZerAuCap)
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Audio captioningonClotho
    SPICE· uses extra data· 2023-11-14
    0.136
    best: 5.3 (ZerAuCap)
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919
  • Audio captioningonClotho
    SPIDEr· uses extra data· 2023-11-14
    0.288
    best: 9.7 (ZerAuCap)
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919

Methodology1 result

  • ClassificationonVocalSound
    Accuracy · uses extra data· 2023-11-14
    92.89
    SOTA
    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language ModelsarXiv:2311.07919