TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/MAD

MAD

TextsVideosIntroduced 2021-12-01

MAD (Movie Audio Descriptions) is an automatically curated large-scale dataset for the task of natural language grounding in videos or natural language moment retrieval. MAD exploits available audio descriptions of mainstream movies. Such audio descriptions are redacted for visually impaired audiences and are therefore highly descriptive of the visual content being displayed. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video, and provides a unique setup for video grounding as the visual stream is truly untrimmed with an average video duration of 110 minutes. 2 orders of magnitude longer than legacy datasets.

Take a look at the paper for additional information.

From the authors on availability: "Due to copyright constraints, MAD’s videos will not be publicly released. However, we will provide all necessary features for our experiments’ reproducibility and promote future research in this direction"

Benchmarks

Video/R@1,IoU=0.1Video/R@5,IoU=0.1Video/R@10,IoU=0.1Video/R@100,IoU=0.1Video/R@50,IoU=0.1Video/R@1,IoU=0.3Video/R@5,IoU=0.3Video/R@1,IoU=0.5Video/R@10,IoU=0.3Video/R@10,IoU=0.5Video/R@100,IoU=0.3Video/R@100,IoU=0.5Video/R@5,IoU=0.5Video/R@50,IoU=0.3Video/R@50,IoU=0.5Video Grounding/R@1,IoU=0.1Video Grounding/R@5,IoU=0.1Video Grounding/R@10,IoU=0.1Video Grounding/R@100,IoU=0.1Video Grounding/R@50,IoU=0.1Video Grounding/R@1,IoU=0.3Video Grounding/R@5,IoU=0.3Video Retrieval/R@1,IoU=0.1Video Retrieval/R@5,IoU=0.1Video Retrieval/R@10,IoU=0.1Video Retrieval/R@100,IoU=0.1Video Retrieval/R@50,IoU=0.1Video Retrieval/R@1,IoU=0.3Video Retrieval/R@5,IoU=0.3

Statistics

Papers
36
Benchmarks
29

Links

Homepage

Tasks

Moment RetrievalNatural Language Moment RetrievalNatural Language Visual GroundingVideoVideo GroundingVideo Retrieval