TVQA+

TextsVideosUnknown

TVQA+ contains 310.8K bounding boxes, linking depicted objects to visual concepts in questions and answers.

Source: TVQA+: Spatio-Temporal Grounding for Video Question Answering Image Source: https://github.com/jayleicn/TVQAplus