IntentQA
Reported on 4 benchmarks across 1 task
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Reasoning4 results
- Accuarcy57.6best: 83.4 (VideoChat2_HD_mistral)
- 65.5best: 90 (VideoChat2_HD_mistral)
- 58.4best: 84 (VideoChat2_HD_mistral)
- 50.5best: 79.1 (Human)