Video Question Answering on NExT-QA (Efficient)
Metric: 1:1 Accuracy (higher is better)
LeaderboardDataset
Loading chart...
Results
Submit a result| # | Model↕ | 1:1 Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ViLA (3B, 4 frames) | 74.4 | No | ViLA: Efficient Video-Language Alignment for Vid... | 2023-12-13 | Code |
| 2 | SeViLA (4 frames) | 73.8 | No | Self-Chained Image-Language Model for Video Loca... | 2023-05-11 | Code |