TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/Audio captioning/AudioCaps

Audio captioning on AudioCaps

Metric: SPIDEr (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕SPIDEr▼Extra DataPaperDate↕Code
1Audio Flamingo32.6YesAudio Flamingo: A Novel Audio Language Model wit...2024-02-02Code
2ZerAuCap18.3YesZero-shot audio captioning with audio-language m...2023-11-14Code
3MQ-Cap0.519YesEnhancing Retrieval-Augmented Audio Captioning w...2024-10-14-
4SLAM-AAC0.518YesSLAM-AAC: Enhancing Audio Captioning with Paraph...2024-10-12Code
5LAVCap0.517NoLAVCap: LLM-based Audio-Visual Captioning using ...2025-01-16Code
6EnCLAP++-large0.51YesEnCLAP++: Analyzing the EnCLAP Framework for Opt...2024-09-02Code
7AutoCap0.507NoTaming Data and Transformers for Audio Generation2024-06-27Code
8LOAE0.505YesEnhancing Automated Audio Captioning via Large L...2024-06-19Code
9EnCLAP++-base0.501YesEnCLAP++: Analyzing the EnCLAP Framework for Opt...2024-09-02Code
10EnCLAP-large0.4954NoEnCLAP: Combining Neural Audio Codec and Audio-T...2024-01-31Code
11CNext-trans0.4951No---
12EnCLAP-base0.4829NoEnCLAP: Combining Neural Audio Codec and Audio-T...2024-01-31Code
13AL-MixGen + Multi-TTA0.475No---
14Rethink-ACT (AST + TF + MIL)0.472No---
15AL-MixGen0.466NoExploring Train and Test-Time Augmentations for ...2022-10-31-
16BART + YAMNet + PANNs0.465No--Code
17CNN+Transformer0.426NoAudio Captioning Transformer2021-07-21Code
18TopDown-AlignedAtt (1NN)0.369No---
19No audio (baseline)0No--Code