Metric: METEOR (higher is better)
| # | Model↕ | METEOR▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VAST | 19.3 | Yes | VAST: A Vision-Audio-Subtitle-Text Omni-Modality... | 2023-05-29 | Code |
| 2 | Audio Flamingo (Pengi trainset) | 18.7 | Yes | Audio Flamingo: A Novel Audio Language Model wit... | 2024-02-02 | Code |
| 3 | VALOR | 17.4 | Yes | VALOR: Vision-Audio-Language Omni-Perception Pre... | 2023-04-17 | Code |
| 4 | ZerAuCap | 9.4 | No | Zero-shot audio captioning with audio-language m... | 2023-11-14 | Code |
| 5 | SLAM-AAC | 0.197 | Yes | SLAM-AAC: Enhancing Audio Captioning with Paraph... | 2024-10-12 | Code |
| 6 | LOAE | 0.197 | Yes | Enhancing Automated Audio Captioning via Large L... | 2024-06-19 | Code |
| 7 | MQ-Cap | 0.192 | No | Enhancing Retrieval-Augmented Audio Captioning w... | 2024-10-14 | - |