TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GPT-4V

GPT-4V

Reported on 34 benchmarks across 8 tasks · 6 papers · 22 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing21 results

  • Visual Question Answering (VQA)onAutoHallusion
    Overall Accuracy· 2024-06-16
    66
    SOTA
    AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language ModelsarXiv:2406.10900
  • Visual Question Answering (VQA)onHallusionBench
    Question Pair Acc · 2023-10-23
    12.2047
    SOTA
    HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language ModelsarXiv:2310.14566
  • Visual Question Answering (VQA)onCORE-MM
    Abductive· 2023-03-15
    77.88
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onCORE-MM
    Analogical· 2023-03-15
    69.86
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onCORE-MM
    Deductive· 2023-03-15
    74.86
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onCORE-MM
    Overall score· 2023-03-15
    74.44
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onInfiMM-Eval
    Abductive· 2023-03-15
    77.88
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onInfiMM-Eval
    Analogical· 2023-03-15
    69.86
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onInfiMM-Eval
    Deductive· 2023-03-15
    74.86
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onInfiMM-Eval
    Overall score· 2023-03-15
    74.44
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onBenchLMM
    GPT-3.5 score· uses extra data· 2023-03-15
    58.37
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question Answering (VQA)onEmbSpatial-Bench
    Generation· 2023-03-15
    36.07
    best: 70.88 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question AnsweringonBenchLMM
    GPT-3.5 score· uses extra data· 2023-03-15
    58.37
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Visual Question AnsweringonEmbSpatial-Bench
    Generation· 2023-03-15
    36.07
    best: 70.88 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    1 Image, 2*2 Stitching, Exact Accuracy· 2023-03-15
    86.09
    best: 94.6 (GPT-4o)
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    1 Image, 4*4 Stitching, Exact Accuracy· 2023-03-15
    54.72
    best: 83 (GPT-4o)
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    1 Image, 8*8 Stitching, Exact Accuracy· 2023-03-15
    7.3
    best: 29.81 (Gemini Pro 1.5)
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    10 Images, 1*1 Stitching, Exact Accuracy· 2023-03-15
    72.36
    best: 97 (GPT-4o)
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    10 Images, 2*2 Stitching, Exact Accuracy· 2023-03-15
    34.24
    best: 81.8 (GPT-4o)
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    10 Images, 4*4 Stitching, Exact Accuracy· 2023-03-15
    7.58
    best: 26.9 (GPT-4o)
    GPT-4 Technical ReportarXiv:2303.08774
  • Long-Context UnderstandingonMMNeedle
    10 Images, 8*8 Stitching, Exact Accuracy
    0
    best: 1 (GPT-4o)

Robots5 results

  • Object RearrangementonOpen6DOR V2
    pos-level0· 2023-03-15
    39.1
    best: 96 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Object RearrangementonOpen6DOR V2
    pos-level1· 2023-03-15
    46.8
    best: 81.5 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Object RearrangementonOpen6DOR V2
    rot-level0· 2023-03-15
    9.1
    best: 68.6 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Object RearrangementonOpen6DOR V2
    rot-level1· 2023-03-15
    6.9
    best: 42.2 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Object RearrangementonOpen6DOR V2
    rot-level2· 2023-03-15
    11.7
    best: 70.1 (SoFar)
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774

Reasoning4 results

  • Multimodal ReasoningonREBUS
    Accuracy· 2024-01-11
    24
    SOTA
    REBUS: A Robust Evaluation Benchmark of Understanding SymbolsarXiv:2401.05604
  • Visual ReasoningonWinoground
    Group Score· 2024-01-05
    37.75
    best: 58.75 (GPT-4V (CoT, pick b/w two options))
    CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsarXiv:2401.02582
  • Visual ReasoningonWinoground
    Image Score· 2024-01-05
    42.5
    best: 68.75 (GPT-4V (CoT, pick b/w two options))
    CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsarXiv:2401.02582
  • Visual ReasoningonWinoground
    Text Score· 2024-01-05
    54.5
    best: 75.5 (GPT-4o + CA)
    CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsarXiv:2401.02582

Other3 results

  • Factual Inconsistency Detection in Chart CaptioningonCHOCOLATE-LLM
    Kendall's Tau-c· 2023-03-15
    0.205
    SOTA
    GPT-4 Technical ReportarXiv:2303.08774
  • Factual Inconsistency Detection in Chart CaptioningonCHOCOLATE-LVLM
    Kendall's Tau-c
    0.157
    best: 0.178 (ChartVE)
  • Factual Inconsistency Detection in Chart CaptioningonCHOCOLATE-FT
    Kendall's Tau-c
    0.215
    best: 0.291 (Bard (before Gemini))

Computer Vision1 result

  • MMR totalonMRR-Benchmark
    Total Column Score· uses extra data· 2023-09-29
    415
    best: 463 (Claude 3.5 Sonnet)
    SOTA
    The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)arXiv:2309.17421