TED VCR
The TED VCR Video Retrieval Dataset is a multimodal collection derived from publicly available TED Talks. It contains thousands of talks filtered to retain only those with meaningful topic labels, producing a long-tail, multi-label taxonomy. For each talk the dataset provides automatic speech-recognition transcripts, slide- and scene-level OCR text, and frame-level visual captions—three textual channels used in VCR retrieval experiments. The data are split into 80 % train, 10 % validation, and 10 % test while preserving the original topic distribution, leaving 542 talks as a held-out test set. Two ready-to-download archives accompany the release: 4.2 GB of trimmed MP4 videos with metadata and 1.8 GB of pre-computed CLIP and Whisper embeddings, both shared under the non-commercial CC BY-NC-ND 4.0 license.