Metric: SPIDEr (higher is better)
| # | Model↕ | SPIDEr▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ZerAuCap | 9.7 | No | Zero-shot audio captioning with audio-language m... | 2023-11-14 | Code |
| 2 | SLAM-AAC | 0.332 | Yes | SLAM-AAC: Enhancing Audio Captioning with Paraph... | 2024-10-12 | Code |
| 3 | LOAE | 0.33 | Yes | Enhancing Automated Audio Captioning via Large L... | 2024-06-19 | Code |
| 4 | MQ-Cap | 0.319 | No | Enhancing Retrieval-Augmented Audio Captioning w... | 2024-10-14 | - |
| 5 | Ensemble | 0.318 | Yes | - | - | - |
| 6 | Audio Flamingo (Pengi trainset) | 0.312 | Yes | Audio Flamingo: A Novel Audio Language Model wit... | 2024-02-02 | Code |
| 7 | Ensemble-RL | 0.295 | Yes | - | - | Code |
| 8 | Qwen-Audio | 0.288 | Yes | Qwen-Audio: Advancing Universal Audio Understand... | 2023-11-14 | Code |
| 9 | Ensemble | 0.207 | No | The NTT DCASE2020 Challenge Task 6 system: Autom... | 2020-07-01 | - |