TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/VTimeLLM

VTimeLLM

Reported on 41 benchmarks across 9 tasks · 1 paper · 5 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning22 results

  • Generative Visual Question AnsweringonVideoInstruct
    Detail Orientation· 2023-11-30
    3.1
    best: 3.56 (PPLLaVA-7B-dpo)
    SOTA
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    Detail Orientation· 2023-11-30
    3.1
    best: 3.56 (PPLLaVA-7B-dpo)
    SOTA
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    Consistency· 2023-11-30
    2.47
    best: 3.81 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    Contextual Understanding· 2023-11-30
    3.4
    best: 4.21 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    Correctness of Information· 2023-11-30
    2.78
    best: 3.85 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    Temporal Understanding· 2023-11-30
    2.49
    best: 3.23 (VLM-RLAIF)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    mean· 2023-11-30
    2.85
    best: 3.73 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    gpt-score· 2023-11-30
    3.4
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    gpt-score· 2023-11-30
    2.78
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    gpt-score· 2023-11-30
    3.1
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    gpt-score· 2023-11-30
    2.49
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Generative Visual Question AnsweringonVideoInstruct
    gpt-score· 2023-11-30
    2.47
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    Consistency· 2023-11-30
    2.47
    best: 3.81 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    Contextual Understanding· 2023-11-30
    3.4
    best: 4.21 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    Correctness of Information· 2023-11-30
    2.78
    best: 3.85 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    Temporal Understanding· 2023-11-30
    2.49
    best: 3.23 (VLM-RLAIF)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    mean· 2023-11-30
    2.85
    best: 3.73 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    gpt-score· 2023-11-30
    3.4
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    gpt-score· 2023-11-30
    2.78
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    gpt-score· 2023-11-30
    3.1
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    gpt-score· 2023-11-30
    2.49
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video-based Generative Performance BenchmarkingonVideoInstruct
    gpt-score· 2023-11-30
    2.47
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445

Natural Language Processing17 results

  • Relation ExtractiononVinoground
    Video Score· 2023-11-30
    27
    best: 51 (GPT-4o (CoT))
    SOTA
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    Detail Orientation· 2023-11-30
    3.1
    best: 3.56 (PPLLaVA-7B-dpo)
    SOTA
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Temporal Relation ExtractiononVinoground
    Video Score· 2023-11-30
    27
    best: 51 (GPT-4o (CoT))
    SOTA
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Relation ExtractiononVinoground
    Group Score· 2023-11-30
    5.2
    best: 35 (GPT-4o (CoT))
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Relation ExtractiononVinoground
    Text Score· 2023-11-30
    19.4
    best: 59.2 (GPT-4o (CoT))
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    Consistency· 2023-11-30
    2.47
    best: 3.81 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    Contextual Understanding· 2023-11-30
    3.4
    best: 4.21 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    Correctness of Information· 2023-11-30
    2.78
    best: 3.85 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    Temporal Understanding· 2023-11-30
    2.49
    best: 3.23 (VLM-RLAIF)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    mean· 2023-11-30
    2.85
    best: 3.73 (PPLLaVA-7B-dpo)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    gpt-score· 2023-11-30
    3.4
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    gpt-score· 2023-11-30
    2.78
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    gpt-score· 2023-11-30
    3.1
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    gpt-score· 2023-11-30
    2.49
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Visual Question Answering (VQA)onVideoInstruct
    gpt-score· 2023-11-30
    2.47
    best: 4.21 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Temporal Relation ExtractiononVinoground
    Group Score· 2023-11-30
    5.2
    best: 35 (GPT-4o (CoT))
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Temporal Relation ExtractiononVinoground
    Text Score· 2023-11-30
    19.4
    best: 59.2 (GPT-4o (CoT))
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445

Computer Vision13 results

  • Video CaptioningonActivityNet Captions
    CIDEr· 2023-11-30
    27.6
    best: 39.3 (VideoCoCa)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Video CaptioningonActivityNet Captions
    SODA· 2023-11-30
    5.8
    best: 7.11 (GVL)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Dense Video CaptioningonActivityNet Captions
    CIDEr· 2023-11-30
    27.6
    best: 33.33 (GVL)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • Dense Video CaptioningonActivityNet Captions
    SODA· 2023-11-30
    5.8
    best: 7.11 (GVL)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Consistency· 2023-11-30
    2.35
    best: 2.59 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Contextual Understanding· 2023-11-30
    2.48
    best: 2.81 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Correctness of Information· 2023-11-30
    2.16
    best: 2.46 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Dense Captioning· 2023-11-30
    1.13
    best: 1.38 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Detail Orientation· 2023-11-30
    2.41
    best: 2.73 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Reasoning· 2023-11-30
    3.45
    best: 3.63 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Spatial Understanding· 2023-11-30
    2.29
    best: 2.8 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    Temporal Understanding· 2023-11-30
    1.46
    best: 1.78 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445
  • VCGBench-DiverseonVideoInstruct
    mean· 2023-11-30
    2.17
    best: 2.47 (VideoGPT+)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445

Other1 result

  • Video-based Generative Performance Benchmarking (Correctness of Information)onVideoInstruct
    gpt-score· 2023-11-30
    2.78
    best: 3.85 (PPLLaVA-7B)
    VTimeLLM: Empower LLM to Grasp Video MomentsarXiv:2311.18445