Metric: METEOR (higher is better)
| # | Model↕ | METEOR▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VALOR | 29.4 | Yes | VALOR: Vision-Audio-Language Omni-Perception Pre... | 2023-04-17 | Code |
| 2 | IcoCap (ViT-B/16) | 25.7 | Yes | - | - | - |
| 3 | VASTA (Kinetics-backbone) | 25.32 | No | Diverse Video Captioning by Adaptive Spatio-temp... | 2022-08-19 | Code |
| 4 | CoCap (ViT/L14) | 25.3 | No | Accurate and Fast Compressed Video Captioning | 2023-09-22 | Code |
| 5 | IcoCap (ViT-B/32) | 24.6 | Yes | - | - | - |
| 6 | ORG-TRL | 22.2 | Yes | Object Relational Graph with Teacher-Recommended... | 2020-02-26 | - |
| 7 | NITS-VC | 18 | No | NITS-VC System for VATEX Video Captioning Challe... | 2020-06-07 | - |