TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Temporal Relation Extraction/Vinoground

Temporal Relation Extraction on Vinoground

Metric: Video Score (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Video Score▼Extra DataPaperDate↕Code
1GPT-4o (CoT)51No---
2GPT-4o38.2No---
3LLaVA-OneVision-Qwen2-72B35.2NoLLaVA-OneVision: Easy Visual Task Transfer2024-08-06Code
4Qwen2-VL-72B32.6NoQwen2-VL: Enhancing Vision-Language Model's Perc...2024-09-18Code
5Qwen2-VL-7B32.4NoQwen2-VL: Enhancing Vision-Language Model's Perc...2024-09-18Code
6LLaVA-OneVision-Qwen2-7B29.4NoLLaVA-OneVision: Easy Visual Task Transfer2024-08-06Code
7MiniCPM-2.629.2NoMiniCPM-V: A GPT-4V Level MLLM on Your Phone2024-08-03Code
8Claude 3.5 Sonnet28.8No---
9InternLM-XC-2.5 (CoT)28.4NoInternLM-XComposer-2.5: A Versatile Large Vision...2024-07-03Code
10InternLM-XC-2.527.8NoInternLM-XComposer-2.5: A Versatile Large Vision...2024-07-03Code
11Gemini-1.5-Pro (CoT)27.6NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
12VTimeLLM27NoVTimeLLM: Empower LLM to Grasp Video Moments2023-11-30Code
13LLaVA-NeXT-Video-7B (CoT)26.2No---
14Video-LLaVA-7B25.8NoVideo-LLaVA: Learning United Visual Representati...2023-11-16Code
15MA-LMM-Vicuna-7B25.6NoMA-LMM: Memory-Augmented Large Multimodal Model ...2024-04-08Code
16LLaVA-NeXT-Video-7B25.6No---
17Gemini-1.5-Pro22.6NoGemini 1.5: Unlocking multimodal understanding a...2024-03-08Code
18Phi-3.5-Vision22.4No---
19LLaVA-NeXT-Video-34B (CoT)22.2No---
20VideoLLaMA2-72B21.8NoVideoLLaMA 2: Advancing Spatial-Temporal Modelin...2024-06-11Code
21LLaVA-NeXT-Video-34B21.2No---
22LanguageBind5NoLanguageBind: Extending Video-Language Pretraini...2023-10-03Code
23ImageBind3.4NoImageBind: One Embedding Space To Bind Them All2023-05-09Code
24VideoCLIP2.8NoVideoCLIP: Contrastive Pre-training for Zero-sho...2021-09-28Code