Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Audio captioning
/
AudioCaps
Audio captioning on AudioCaps
Metric: SPIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
SPIDEr
▼
Extra Data
Paper
Date
↕
Code
1
Audio Flamingo
32.6
Yes
Audio Flamingo: A Novel Audio Language Model wit...
2024-02-02
Code
2
ZerAuCap
18.3
Yes
Zero-shot audio captioning with audio-language m...
2023-11-14
Code
3
MQ-Cap
0.519
Yes
Enhancing Retrieval-Augmented Audio Captioning w...
2024-10-14
-
4
SLAM-AAC
0.518
Yes
SLAM-AAC: Enhancing Audio Captioning with Paraph...
2024-10-12
Code
5
LAVCap
0.517
No
LAVCap: LLM-based Audio-Visual Captioning using ...
2025-01-16
Code
6
EnCLAP++-large
0.51
Yes
EnCLAP++: Analyzing the EnCLAP Framework for Opt...
2024-09-02
Code
7
AutoCap
0.507
No
Taming Data and Transformers for Audio Generation
2024-06-27
Code
8
LOAE
0.505
Yes
Enhancing Automated Audio Captioning via Large L...
2024-06-19
Code
9
EnCLAP++-base
0.501
Yes
EnCLAP++: Analyzing the EnCLAP Framework for Opt...
2024-09-02
Code
10
EnCLAP-large
0.4954
No
EnCLAP: Combining Neural Audio Codec and Audio-T...
2024-01-31
Code
11
CNext-trans
0.4951
No
-
-
-
12
EnCLAP-base
0.4829
No
EnCLAP: Combining Neural Audio Codec and Audio-T...
2024-01-31
Code
13
AL-MixGen + Multi-TTA
0.475
No
-
-
-
14
Rethink-ACT (AST + TF + MIL)
0.472
No
-
-
-
15
AL-MixGen
0.466
No
Exploring Train and Test-Time Augmentations for ...
2022-10-31
-
16
BART + YAMNet + PANNs
0.465
No
-
-
Code
17
CNN+Transformer
0.426
No
Audio Captioning Transformer
2021-07-21
Code
18
TopDown-AlignedAtt (1NN)
0.369
No
-
-
-
19
No audio (baseline)
0
No
-
-
Code