V2C

Video-to-Commonsense

TextsVideos

Contains ~9K videos of human agents performing various actions, annotated with 3 types of commonsense descriptions.