VSTaR-1M
TextsVideosIntroduced 2024-07-08
VSTaR-1M is a 1M instruction tuning dataset, created using Video-STaR, with the source datasets:
The videos for VSTaR-1M can be found in the links above.
VSTaR-1M is built off of diverse task with the goal of enhancing video-language alignment in Large Video-Language Models (LVLMs).
- kinetics700_tune_.json - Instruction tuning QA pairs for the Kinetics700 source dataset. Good for increasing diversity and for more fine-grained activity recognition.
- starb_tune_.json - Instruction tuning QA pairs for the STAR-benchmark source dataset. Good for temporal reasoning.
- finediving_tune_.json - Instruction tuning QA pairs for the FineDiving source dataset. Example of adapting LVLMs for novel tasks (Olympic diving judge).