TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Video-based Generative Performance Benchmarking/VideoInstruct

Video-based Generative Performance Benchmarking on VideoInstruct

Metric: Temporal Understanding (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Temporal Understanding▼Extra DataPaperDate↕Code
1VLM-RLAIF3.23NoTuning Large Multimodal Models for Videos using ...2024-02-06Code
2PPLLaVA-7B-dpo3.21NoPPLLaVA: Varied Video Sequence Understanding Wit...2024-11-04Code
3PPLLaVA-7B3NoPPLLaVA: Varied Video Sequence Understanding Wit...2024-11-04Code
4ST-LLM-7B2.93NoST-LLM: Large Language Models Are Effective Temp...2024-03-30Code
5IG-VLM-GPT4v2.89NoAn Image Grid Can Be Worth a Video: Zero-shot Vi...2024-03-27Code
6VideoGPT+2.83NoVideoGPT+: Integrating Image and Video Encoders ...2024-06-13Code
7CAT-7B2.81NoCAT: Enhancing Multimodal Large Language Model t...2024-03-07Code
8LITA-13B2.68NoLITA: Language Instructed Temporal-Localization ...2024-03-27Code
9PLLaVA-34B2.67NoPLLaVA : Parameter-free LLaVA Extension from Ima...2024-04-25Code
10VideoChat22.66NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
11VideoChat2_HD_mistral2.65NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
12LLaMA-VID-13B (2 Token)2.58NoLLaMA-VID: An Image is Worth 2 Tokens in Large L...2023-11-28Code
13VTimeLLM2.49NoVTimeLLM: Empower LLM to Grasp Video Moments2023-11-30Code
14LLaMA-VID-7B (2 Token)2.46NoLLaMA-VID: An Image is Worth 2 Tokens in Large L...2023-11-28Code
15Chat-UniVi2.39NoChat-UniVi: Unified Visual Representation Empowe...2023-11-14Code
16BT-Adapter2.34NoBT-Adapter: Video Conversation is Feasible Witho...2023-09-27Code
17BT-Adapter (zero-shot)2.13NoBT-Adapter: Video Conversation is Feasible Witho...2023-09-27Code
18Video-ChatGPT1.98NoVideo-ChatGPT: Towards Detailed Video Understand...2023-06-08Code
19LLaMA Adapter1.98NoLLaMA-Adapter V2: Parameter-Efficient Visual Ins...2023-04-28Code
20Video Chat1.94NoVideoChat: Chat-Centric Video Understanding2023-05-10Code
21Video LLaMA1.82NoVideo-LLaMA: An Instruction-tuned Audio-Visual L...2023-06-05Code