ImplicitQA
The ImplicitQA dataset was introduced in the paper ImplicitQA: Going beyond frames towards Implicit Video Reasoning.
Project page: https://swetha5.github.io/ImplicitQA/
ImplicitQA is a novel benchmark specifically designed to test models on implicit reasoning in Video Question Answering (VideoQA). Unlike existing VideoQA benchmarks that primarily focus on questions answerable through explicit visual content (actions, objects, events directly observable within individual frames or short clips), ImplicitQA addresses the need for models to infer motives, causality, and relationships <u>across discontinuous frames</u>. This mirrors human-like understanding of creative and cinematic videos, which often employ storytelling techniques that deliberately omit certain depictions.
The dataset comprises 1,000 meticulously annotated QA pairs derived from over 320 high-quality creative video clips. These QA pairs are systematically categorized into key reasoning dimensions, including:
- Lateral spatial reasoning
- Vertical spatial reasoning
- Relative Depth and proximity
- Viewpoint and visibility
- Motion and trajectory Dynamics
- Causal and motivational reasoning
- Social interactions and Relationships
- Physical and Environmental context
- Inferred counting
The annotations are deliberately challenging, crafted to ensure high quality and to highlight the difficulty of implicit reasoning for current VideoQA models.