Metric: BLEU-4 (higher is better)
| # | Model↕ | BLEU-4▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Audio Flamingo | 14.3 | Yes | Audio Flamingo: A Novel Audio Language Model wit... | 2024-02-02 | Code |
| 2 | Shaharabany et al. | 9.8 | Yes | Zero-Shot Audio Captioning via Audibility Guidance | 2023-09-07 | - |
| 3 | ZerAuCap | 6.8 | Yes | Zero-shot audio captioning with audio-language m... | 2023-11-14 | Code |
| 4 | MQ-Cap | 0.301 | Yes | Enhancing Retrieval-Augmented Audio Captioning w... | 2024-10-14 | - |
| 5 | LAVCap | 0.297 | No | LAVCap: LLM-based Audio-Visual Captioning using ... | 2025-01-16 | Code |
| 6 | VAST | 0.295 | Yes | VAST: A Vision-Audio-Subtitle-Text Omni-Modality... | 2023-05-29 | Code |
| 7 | Rethink-ACT (AST + TF + MIL) | 0.285 | No | - | - | - |
| 8 | VALOR | 0.27 | Yes | VALOR: Vision-Audio-Language Omni-Perception Pre... | 2023-04-17 | Code |
| 9 | No audio (baseline) | 0 | No | - | - | Code |