TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Gemini 1.5 Pro

Gemini 1.5 Pro

Reported on 7 benchmarks across 2 tasks · 1 paper · 7 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning4 results

  • Video Question AnsweringonTVBench
    Average Accuracy· 2024-03-08
    47.6
    best: 63.6 (Seed1.5-VL thinking)
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530
  • Video Question AnsweringonVideo-MME (w/o subs)
    Accuracy (%)· 2024-03-08
    71.9
    best: 77.4 (Video-RAG (based on LLaVA-Video))
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530
  • Video Question AnsweringonZero-shot Video Question Answering on LongVideoBench
    Accuracy (% )· uses extra data· 2024-03-08
    66.7
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530
  • Video Question AnsweringonVideo-MME
    Accuracy (%)· 2024-03-08
    81.3
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530

Natural Language Processing3 results

  • Question AnsweringonVideo-MME (w/o subs)
    Accuracy (%)· 2024-03-08
    71.9
    best: 77.4 (Video-RAG (based on LLaVA-Video))
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530
  • Question AnsweringonZero-shot Video Question Answering on LongVideoBench
    Accuracy (% )· uses extra data· 2024-03-08
    66.7
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530
  • Question AnsweringonVideo-MME
    Accuracy (%)· 2024-03-08
    81.3
    SOTA
    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of contextarXiv:2403.05530