Metric: Average Accuracy (higher is better)
| # | Model↕ | Average Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GF (sup) - Faster RCNN | 55.08 | No | Glance and Focus: Memory Prompting for Multi-Eve... | 2024-01-03 | Code |
| 2 | MIST - CLIP | 54.39 | No | MIST: Multi-modal Iterative Spatial-Temporal Tra... | 2022-12-19 | Code |
| 3 | GF (uns) - S3D | 53.33 | No | Glance and Focus: Memory Prompting for Multi-Eve... | 2024-01-03 | Code |
| 4 | SViTT | 52.7 | No | SViTT: Temporal Learning of Sparse Video-Text Tr... | 2023-04-18 | Code |
| 5 | MIST - AIO | 50.96 | No | MIST: Multi-modal Iterative Spatial-Temporal Tra... | 2022-12-19 | Code |
| 6 | SHG-VQA (trained from scratch) | 49.2 | No | Learning Situation Hyper-Graphs for Video Questi... | 2023-04-18 | Code |
| 7 | AIO - ViT | 48.59 | No | Glance and Focus: Memory Prompting for Multi-Eve... | 2024-01-03 | Code |
| 8 | MMTF | 44.36 | No | - | - | - |