MSRVTT-QA
The MSR-VTT-QA dataset is a benchmark for the task of Visual Question Answering (VQA) on the MSR-VTT (Microsoft Research Video to Text) dataset. The MSR-VTT-QA benchmark is used to evaluate models on their ability to answer questions based on these videos. It's part of the tasks that this dataset is used for, along with Video Retrieval, Video Captioning, Zero-Shot Video Question Answering, Zero-Shot Video Retrieval, and Text-to-Video Generation.
Benchmarks
Question Answering/AccuracyQuestion Answering/Confidence ScoreVideo Question Answering/AccuracyVideo Question Answering/Confidence ScoreVisual Question Answering/Test AccuracyVisual Question Answering/AccuracyVisual Question Answering (VQA)/AccuracyVisual Question Answering (VQA)/Test AccuracyZero-Shot Learning/Accuracy