Papers With Code 2 | ML Benchmarks, SotA Results & Code

The ImplicitQA dataset was introduced in the paper ImplicitQA: Going beyond frames towards Implicit Video Reasoning.

Project page: https://swetha5.github.io/ImplicitQA/

ImplicitQA is a novel benchmark specifically designed to test models on implicit reasoning in Video Question Answering (VideoQA). Unlike existing VideoQA benchmarks that primarily focus on questions answerable through explicit visual content (actions, objects, events directly observable within individual frames or short clips), ImplicitQA addresses the need for models to infer motives, causality, and relationships <u>across discontinuous frames</u>. This mirrors human-like understanding of creative and cinematic videos, which often employ storytelling techniques that deliberately omit certain depictions.

The dataset comprises 1,000 meticulously annotated QA pairs derived from over 320 high-quality creative video clips. These QA pairs are systematically categorized into key reasoning dimensions, including:

Lateral spatial reasoning
Vertical spatial reasoning
Relative Depth and proximity
Viewpoint and visibility
Motion and trajectory Dynamics
Causal and motivational reasoning
Social interactions and Relationships
Physical and Environmental context
Inferred counting

The annotations are deliberately challenging, crafted to ensure high quality and to highlight the difficulty of implicit reasoning for current VideoQA models.