R2VQ

Recipe-to-Video Questions

TextsVideosUnknownIntroduced 2021-05-12

R2VQ is a dataset designed for testing competence-based comprehension of machines over a multimodal recipe collection, which contains text-video aligned recipes.

A total of 51,331 cooking events are annotated, which contain 19,201 explicit ingredients, 16,338 implicit ingredients, 12,316 explicit props, and 11,868 implicit props.