Papers With Code 2 | ML Benchmarks, SotA Results & Code

VSTaR-1M is a 1M instruction tuning dataset, created using Video-STaR, with the source datasets:

The videos for VSTaR-1M can be found in the links above.

VSTaR-1M is built off of diverse task with the goal of enhancing video-language alignment in Large Video-Language Models (LVLMs).

kinetics700_tune_.json - Instruction tuning QA pairs for the Kinetics700 source dataset. Good for increasing diversity and for more fine-grained activity recognition.
starb_tune_.json - Instruction tuning QA pairs for the STAR-benchmark source dataset. Good for temporal reasoning.
finediving_tune_.json - Instruction tuning QA pairs for the FineDiving source dataset. Example of adapting LVLMs for novel tasks (Olympic diving judge).