TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/LLaVA-1.5

LLaVA-1.5

Reported on 19 benchmarks across 3 tasks · 2 papers · 10 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing16 results

  • Visual Question Answering (VQA)on6-DoF SpatialBench
    Orientation-abs· 2023-10-05
    25.8
    best: 31.3 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)on6-DoF SpatialBench
    Orientation-rel· 2023-10-05
    28.3
    best: 54.6 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)on6-DoF SpatialBench
    Position-abs· 2023-10-05
    24.5
    best: 33.8 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)on6-DoF SpatialBench
    Position-rel· 2023-10-05
    30.9
    best: 59.6 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)on6-DoF SpatialBench
    Total· 2023-10-05
    27.2
    best: 43.9 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answeringon6-DoF SpatialBench
    Orientation-abs· 2023-10-05
    25.8
    best: 31.3 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answeringon6-DoF SpatialBench
    Orientation-rel· 2023-10-05
    28.3
    best: 54.6 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answeringon6-DoF SpatialBench
    Position-abs· 2023-10-05
    24.5
    best: 33.8 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answeringon6-DoF SpatialBench
    Position-rel· 2023-10-05
    30.9
    best: 59.6 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answeringon6-DoF SpatialBench
    Total· 2023-10-05
    27.2
    best: 43.9 (SoFar)
    SOTA
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)onAutoHallusion
    Overall Accuracy· 2023-10-05
    44.5
    best: 66 (GPT-4V)
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)onInfiMM-Eval
    Abductive· 2023-10-05
    47.91
    best: 77.88 (GPT-4V)
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)onInfiMM-Eval
    Analogical· 2023-10-05
    24.31
    best: 69.86 (GPT-4V)
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)onInfiMM-Eval
    Deductive· 2023-10-05
    30.94
    best: 74.86 (GPT-4V)
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)onInfiMM-Eval
    Overall score· 2023-10-05
    32.62
    best: 74.44 (GPT-4V)
    Improved Baselines with Visual Instruction TuningarXiv:2310.03744
  • Visual Question Answering (VQA)onHallusionBench
    Question Pair Acc
    4.3307
    best: 12.2047 (GPT-4V)

Reasoning3 results

  • Visual ReasoningonWinoground
    Group Score· 2023-11-27
    20.1
    best: 58.75 (GPT-4V (CoT, pick b/w two options))
    Compositional Chain-of-Thought Prompting for Large Multimodal ModelsarXiv:2311.17076
  • Visual ReasoningonWinoground
    Image Score· 2023-11-27
    33.3
    best: 68.75 (GPT-4V (CoT, pick b/w two options))
    Compositional Chain-of-Thought Prompting for Large Multimodal ModelsarXiv:2311.17076
  • Visual ReasoningonWinoground
    Text Score· 2023-11-27
    36
    best: 75.5 (GPT-4o + CA)
    Compositional Chain-of-Thought Prompting for Large Multimodal ModelsarXiv:2311.17076