ActivityNet Entities

Videos

ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase. This allows training video description models with this data, and importantly, evaluate how grounded or "true" such model are to the video they describe.

Source: https://github.com/facebookresearch/ActivityNet-Entities Image Source: https://github.com/facebookresearch/ActivityNet-Entities