Metric: ROUGE-L (higher is better)
| # | Model↕ | ROUGE-L▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VLTinT (ae-test split) C3D/Ling | 36.56 | No | VLTinT: Visual-Linguistic Transformer-in-Transfo... | 2022-11-28 | Code |
| 2 | VLCap (ae-test split) - Appearance + Language | 35.99 | No | VLCap: Vision-Language with Contrastive Learning... | 2022-06-26 | Code |
| 3 | VideoCoCa | 35 | Yes | VideoCoCa: Video-Text Modeling with Zero-Shot Tr... | 2022-12-09 | - |
| 4 | COOT (ae-test split) - Only Appearance features | 31.45 | No | COOT: Cooperative Hierarchical Transformer for V... | 2020-11-01 | Code |