TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/InternVideo

InternVideo

Reported on 93 benchmarks across 15 tasks · 2 papers · 73 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision67 results

  • VideoonHACS
    Average-mAP· 2022-12-06
    41.55
    best: 45.8 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonFineAction
    mAP· 2022-12-06
    17.57
    best: 29.6 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonVATEX
    text-to-video R@1· 2022-12-06
    71.1
    best: 87.7 (GRAM)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonVATEX
    video-to-text R@1· 2022-12-06
    87.2
    best: 89.3 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonActivityNet
    text-to-video R@1· uses extra data· 2022-12-06
    62.2
    best: 74.1 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonActivityNet
    video-to-text R@1· uses extra data· 2022-12-06
    62.8
    best: 69.7 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonDiDeMo
    text-to-video R@1· uses extra data· 2022-12-06
    57.9
    best: 74.2 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonDiDeMo
    video-to-text R@1· uses extra data· 2022-12-06
    59.1
    best: 71.9 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonMSR-VTT
    text-to-video R@1· uses extra data· 2022-12-06
    55.2
    best: 64 (GRAM)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonLSMDC
    video-to-text R@1· uses extra data· 2022-12-06
    34.9
    best: 46.7 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonMSVD
    video-to-text R@1· uses extra data· 2022-12-06
    76.3
    best: 85.2 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonKinetics-400
    Acc@1· 2022-12-06
    91.1
    best: 93.6 (OmniVec2)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Temporal Action LocalizationonHACS
    Average-mAP· 2022-12-06
    41.55
    best: 45.8 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Temporal Action LocalizationonFineAction
    mAP· 2022-12-06
    17.57
    best: 29.6 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action LocalizationonHACS
    Average-mAP· 2022-12-06
    41.55
    best: 45.8 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action LocalizationonFineAction
    mAP· 2022-12-06
    17.57
    best: 29.6 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action LocalizationonAVA-Kinetics
    val mAP· uses extra data· 2022-12-06
    41.01
    best: 42.6 (VideoMAE V2-g)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonVATEX
    text-to-video R@1· 2022-12-06
    71.1
    best: 87.7 (GRAM)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonVATEX
    video-to-text R@1· 2022-12-06
    87.2
    best: 89.3 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonActivityNet
    text-to-video R@1· uses extra data· 2022-12-06
    62.2
    best: 74.1 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonActivityNet
    video-to-text R@1· uses extra data· 2022-12-06
    62.8
    best: 69.7 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonDiDeMo
    text-to-video R@1· uses extra data· 2022-12-06
    57.9
    best: 74.2 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonDiDeMo
    video-to-text R@1· uses extra data· 2022-12-06
    59.1
    best: 71.9 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonMSR-VTT
    text-to-video R@1· uses extra data· 2022-12-06
    55.2
    best: 64 (GRAM)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonLSMDC
    video-to-text R@1· uses extra data· 2022-12-06
    34.9
    best: 46.7 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonMSVD
    video-to-text R@1· uses extra data· 2022-12-06
    76.3
    best: 85.2 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonVATEX
    text-to-video R@1· 2022-12-06
    49.5
    best: 83.9 (GRAM)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonVATEX
    video-to-text R@1· 2022-12-06
    69.5
    best: 85.4 (InternVideo2-1B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@1· uses extra data· 2022-12-06
    40.7
    best: 55.9 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonMSR-VTT
    video-to-text R@1· uses extra data· 2022-12-06
    39.6
    best: 53.7 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonMSVD
    video-to-text R@1· uses extra data· 2022-12-06
    67.6
    best: 83.3 (InternVideo2-1B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonDiDeMo
    video-to-text R@1· uses extra data· 2022-12-06
    33.5
    best: 57.1 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonDiDeMo
    video-to-text R@10· uses extra data· 2022-12-06
    71.1
    best: 85 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonDiDeMo
    video-to-text R@5· uses extra data· 2022-12-06
    60.3
    best: 79.9 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@1· uses extra data· 2022-12-06
    17.6
    best: 33.8 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@10· uses extra data· 2022-12-06
    40.2
    best: 62.2 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonLSMDC
    video-to-text R@1· uses extra data· 2022-12-06
    13.2
    best: 30.1 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonLSMDC
    video-to-text R@10· uses extra data· 2022-12-06
    34.9
    best: 54.8 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonLSMDC
    video-to-text R@5· uses extra data· 2022-12-06
    27.8
    best: 47.7 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonActivityNet
    video-to-text R@1· uses extra data· 2022-12-06
    31.4
    best: 56.5 (InternVideo2-6B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • State Change Object DetectiononEgo4D
    AP· 2022-11-17
    37.19
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • State Change Object DetectiononEgo4D
    AP50· 2022-11-17
    55.97
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • State Change Object DetectiononEgo4D
    AP75· 2022-11-17
    38.44
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Short-term Object Interaction AnticipationonEgo4D
    Noun (Top5 mAP)· 2022-11-17
    24.6
    best: 34.886 (SOIA-DOD)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Short-term Object Interaction AnticipationonEgo4D
    Noun+TTC (Top5 mAP)· 2022-11-17
    7.64
    best: 12.41 (EgoVideo)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Short-term Object Interaction AnticipationonEgo4D
    Noun+Verb(Top5 mAP)· 2022-11-17
    9.18
    best: 17.614 (SOIA-DOD)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Short-term Object Interaction AnticipationonEgo4D
    Overall (Top5 mAP)· 2022-11-17
    3.4
    best: 7.21 (EgoVideo)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Future Hand PredictiononEgo4D
    C.Disp(Left)· 2022-11-17
    53.33
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Future Hand PredictiononEgo4D
    C.Disp(Right)· 2022-11-17
    53.37
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Future Hand PredictiononEgo4D
    Disp(Total)· 2022-11-17
    196.8
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Future Hand PredictiononEgo4D
    M.Disp(Left)· 2022-11-17
    43.25
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Future Hand PredictiononEgo4D
    M.Disp(Right)· 2022-11-17
    46.25
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • VideoonActivityNet-1.3
    mAP· uses extra data· 2022-12-06
    39
    best: 42.9 (RDFA-S6 (InternVideo2-6B))
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonMSR-VTT
    video-to-text R@1· uses extra data· 2022-12-06
    57.9
    best: 64.8 (GRAM)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonLSMDC
    text-to-video R@1· uses extra data· 2022-12-06
    34
    best: 46.4 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • VideoonMSVD
    text-to-video R@1· uses extra data· 2022-12-06
    58.4
    best: 61.4 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Temporal Action LocalizationonActivityNet-1.3
    mAP· uses extra data· 2022-12-06
    39
    best: 42.9 (RDFA-S6 (InternVideo2-6B))
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action LocalizationonActivityNet-1.3
    mAP· uses extra data· 2022-12-06
    39
    best: 42.9 (RDFA-S6 (InternVideo2-6B))
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonMSR-VTT
    video-to-text R@1· uses extra data· 2022-12-06
    57.9
    best: 64.8 (GRAM)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonLSMDC
    text-to-video R@1· uses extra data· 2022-12-06
    34
    best: 46.4 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video RetrievalonMSVD
    text-to-video R@1· uses extra data· 2022-12-06
    58.4
    best: 61.4 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@1· uses extra data· 2022-12-06
    43.4
    best: 59.3 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonDiDeMo
    text-to-video R@1· uses extra data· 2022-12-06
    31.5
    best: 57.9 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonDiDeMo
    text-to-video R@10· uses extra data· 2022-12-06
    68.2
    best: 85.1 (InternVideo2-1B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonDiDeMo
    text-to-video R@5· uses extra data· 2022-12-06
    57.6
    best: 80 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@5· uses extra data· 2022-12-06
    32.4
    best: 55.9 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot Video RetrievalonActivityNet
    text-to-video R@1· uses extra data· 2022-12-06
    30.7
    best: 63.2 (InternVideo2-6B)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191

Natural Language Processing10 results

  • Question AnsweringonEgoSchema (fullset)
    Accuracy· 2022-12-06
    32.1
    best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Visual Question Answering (VQA)onTGIF-QA
    Accuracy· 2022-12-06
    0.722
    best: 0.732 (HiTeA)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Natural Language QueriesonEgo4D
    R@1 IoU=0.3· 2022-11-17
    16.45
    best: 28.05 (EgoVideo)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Natural Language QueriesonEgo4D
    R@1 IoU=0.5· 2022-11-17
    10.06
    best: 19.31 (EgoVideo)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Natural Language QueriesonEgo4D
    R@1 Mean(0.3 and 0.5)· 2022-11-17
    13.26
    best: 23.68 (EgoVideo)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Natural Language QueriesonEgo4D
    R@5 IoU=0.3· 2022-11-17
    22.95
    best: 45.63 (DeCafNet-100%)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Natural Language QueriesonEgo4D
    R@5 IoU=0.5· 2022-11-17
    16.1
    best: 33.93 (DeCafNet-100%)
    SOTA
    InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesarXiv:2211.09529
  • Question AnsweringonSTAR Benchmark
    Accuracy· 2022-12-06
    41.6
    best: 59 (VideoChat2)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Visual Question Answering (VQA)onMSRVTT-QA
    Accuracy· uses extra data· 2022-12-06
    0.471
    best: 0.496 (VLAB)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Visual Question Answering (VQA)onMSVD-QA
    Accuracy· uses extra data· 2022-12-06
    0.555
    best: 0.61 (VLAB)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191

Robots5 results

  • Activity RecognitiononSomething-Something V1
    Top 1 Accuracy· uses extra data· 2022-12-06
    70
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Activity RecognitiononSomething-Something V2
    Top-1 Accuracy· uses extra data· 2022-12-06
    77.2
    best: 77.3 (MVD (Kinetics400 pretrain, ViT-H, 16 frame))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Activity RecognitiononAVA v2.2
    mAP· uses extra data· 2022-12-06
    41.01
    best: 45.1 (LART (Hiera-H, K700 PT+FT))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Activity RecognitiononUCF101-MiTv2
    AUROC· 2022-12-06
    91.85
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Activity RecognitiononUCF-HMDB
    AUROC· 2022-12-06
    85.48
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191

Time Series5 results

  • Action RecognitiononSomething-Something V1
    Top 1 Accuracy· uses extra data· 2022-12-06
    70
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action RecognitiononSomething-Something V2
    Top-1 Accuracy· uses extra data· 2022-12-06
    77.2
    best: 77.3 (MVD (Kinetics400 pretrain, ViT-H, 16 frame))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action RecognitiononAVA v2.2
    mAP· uses extra data· 2022-12-06
    41.01
    best: 45.1 (LART (Hiera-H, K700 PT+FT))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action RecognitiononUCF101-MiTv2
    AUROC· 2022-12-06
    91.85
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Action RecognitiononUCF-HMDB
    AUROC· 2022-12-06
    85.48
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191

Methodology3 results

  • Zero-Shot LearningonHACS
    Average-mAP· 2022-12-06
    41.55
    best: 45.8 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot LearningonFineAction
    mAP· 2022-12-06
    17.57
    best: 29.6 (RDFA-S6 (InternVideo2-6B))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Zero-Shot LearningonActivityNet-1.3
    mAP· uses extra data· 2022-12-06
    39
    best: 42.9 (RDFA-S6 (InternVideo2-6B))
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191

Reasoning3 results

  • Video Question AnsweringonSTAR Benchmark
    Average Accuracy· 2022-12-06
    58.7
    best: 67.1 (VLAP (4 frames))
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video Question AnsweringonEgoSchema (fullset)
    Accuracy· 2022-12-06
    32.1
    best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
    SOTA
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191
  • Video Question AnsweringonSTAR Benchmark
    Accuracy· 2022-12-06
    41.6
    best: 59 (VideoChat2)
    InternVideo: General Video Foundation Models via Generative and Discriminative LearningarXiv:2212.03191