MSRVTT-QA

The MSR-VTT-QA dataset is a benchmark for the task of Visual Question Answering (VQA) on the MSR-VTT (Microsoft Research Video to Text) dataset. The MSR-VTT-QA benchmark is used to evaluate models on their ability to answer questions based on these videos. It's part of the tasks that this dataset is used for, along with Video Retrieval, Video Captioning, Zero-Shot Video Question Answering, Zero-Shot Video Retrieval, and Text-to-Video Generation.

Benchmarks

Question Answering/Accuracy Question Answering/Confidence Score Video Question Answering/Accuracy Video Question Answering/Confidence Score Visual Question Answering/Test Accuracy Visual Question Answering/Accuracy Visual Question Answering (VQA)/Accuracy Visual Question Answering (VQA)/Test Accuracy Zero-Shot Learning/Accuracy