VSTaR-1M

TextsVideosIntroduced 2024-07-08

VSTaR-1M is a 1M instruction tuning dataset, created using Video-STaR, with the source datasets:

The videos for VSTaR-1M can be found in the links above.

VSTaR-1M is built off of diverse task with the goal of enhancing video-language alignment in Large Video-Language Models (LVLMs).

  • kinetics700_tune_.json - Instruction tuning QA pairs for the Kinetics700 source dataset. Good for increasing diversity and for more fine-grained activity recognition.
  • starb_tune_.json - Instruction tuning QA pairs for the STAR-benchmark source dataset. Good for temporal reasoning.
  • finediving_tune_.json - Instruction tuning QA pairs for the FineDiving source dataset. Example of adapting LVLMs for novel tasks (Olympic diving judge).