EgoSchema
VideosIntroduced 2023-08-17
EgoSchema is very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior.
Benchmarks
Related Benchmarks
EgoSchema (fullset)/Question Answering/AccuracyEgoSchema (fullset)/Video Question Answering/AccuracyEgoSchema (subset)/Question Answering/AccuracyEgoSchema (subset)/Question Answering/Inference Speed (s)EgoSchema (subset)/Video Question Answering/AccuracyEgoSchema (subset)/Video Question Answering/Inference Speed (s)