TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/Audio captioning/AudioCaps

Audio captioning on AudioCaps

Metric: SPICE (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕SPICE▼Extra DataPaperDate↕Code
1Audio Flamingo15.1YesAudio Flamingo: A Novel Audio Language Model wit...2024-02-02Code
2ZerAuCap8.6YesZero-shot audio captioning with audio-language m...2023-11-14Code
3EnCLAP++-large0.197YesEnCLAP++: Analyzing the EnCLAP Framework for Opt...2024-09-02Code
4MQ-Cap0.194YesEnhancing Retrieval-Augmented Audio Captioning w...2024-10-14-
5SLAM-AAC0.194YesSLAM-AAC: Enhancing Audio Captioning with Paraph...2024-10-12Code
6LOAE0.193YesEnhancing Automated Audio Captioning via Large L...2024-06-19Code
7EnCLAP++-base0.188YesEnCLAP++: Analyzing the EnCLAP Framework for Opt...2024-09-02Code
8EnCLAP-large0.1879NoEnCLAP: Combining Neural Audio Codec and Audio-T...2024-01-31Code
9EnCLAP-base0.1863NoEnCLAP: Combining Neural Audio Codec and Audio-T...2024-01-31Code
10LAVCap0.185NoLAVCap: LLM-based Audio-Visual Captioning using ...2025-01-16Code
11CNext-trans0.1841No---
12AutoCap0.182NoTaming Data and Transformers for Audio Generation2024-06-27Code
13AL-MixGen + Multi-TTA0.181No---
14Rethink-ACT (AST + TF + MIL)0.18No---
15AL-MixGen0.177NoExploring Train and Test-Time Augmentations for ...2022-10-31-
16BART + YAMNet + PANNs0.176No--Code
17CNN+Transformer0.159NoAudio Captioning Transformer2021-07-21Code
18TopDown-AlignedAtt (1NN)0.144No---
19No audio (baseline)0No--Code