Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Audio captioning
/
AudioCaps
Audio captioning on AudioCaps
Metric: METEOR (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
METEOR
▼
Extra Data
Paper
Date
↕
Code
1
Audio Flamingo
20.5
Yes
Audio Flamingo: A Novel Audio Language Model wit...
2024-02-02
Code
2
ZerAuCap
12.3
Yes
Zero-shot audio captioning with audio-language m...
2023-11-14
Code
3
Shaharabany et al.
8.6
Yes
Zero-Shot Audio Captioning via Audibility Guidance
2023-09-07
-
4
No audio (baseline)
4.1
No
Zero-shot audio captioning with audio-language m...
2023-11-14
Code
5
EnCLAP++-large
0.269
Yes
EnCLAP++: Analyzing the EnCLAP Framework for Opt...
2024-09-02
Code
6
SLAM-AAC
0.268
Yes
SLAM-AAC: Enhancing Audio Captioning with Paraph...
2024-10-12
Code
7
LOAE
0.267
Yes
Enhancing Automated Audio Captioning via Large L...
2024-06-19
Code
8
MQ-Cap
0.266
Yes
Enhancing Retrieval-Augmented Audio Captioning w...
2024-10-14
-
9
LAVCap
0.262
No
LAVCap: LLM-based Audio-Visual Captioning using ...
2025-01-16
Code
10
EnCLAP++-base
0.257
Yes
EnCLAP++: Analyzing the EnCLAP Framework for Opt...
2024-09-02
Code
11
EnCLAP-large
0.2554
No
EnCLAP: Combining Neural Audio Codec and Audio-T...
2024-01-31
Code
12
AutoCap
0.253
No
Taming Data and Transformers for Audio Generation
2024-06-27
Code
13
CNext-trans
0.2527
No
-
-
-
14
EnCLAP-base
0.2473
No
EnCLAP: Combining Neural Audio Codec and Audio-T...
2024-01-31
Code
15
VAST
0.247
Yes
VAST: A Vision-Audio-Subtitle-Text Omni-Modality...
2023-05-29
Code
16
Rethink-ACT (AST + TF + MIL)
0.242
No
-
-
-
17
VALOR
0.231
Yes
VALOR: Vision-Audio-Language Omni-Perception Pre...
2023-04-17
Code