Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Otter

Otter

Reported on 7 benchmarks across 3 tasks · 3 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

Visual Question Answering (VQA)onInfiMM-Eval
Abductive· 2023-05-05
33.64
best: 77.88 (GPT-4V)
Otter: A Multi-Modal Model with In-Context Instruction Tuning arXiv:2305.03726
Visual Question Answering (VQA)onInfiMM-Eval
Analogical· 2023-05-05
13.33
best: 69.86 (GPT-4V)
Otter: A Multi-Modal Model with In-Context Instruction Tuning arXiv:2305.03726
Visual Question Answering (VQA)onInfiMM-Eval
Deductive· 2023-05-05
22.49
best: 74.86 (GPT-4V)
Otter: A Multi-Modal Model with In-Context Instruction Tuning arXiv:2305.03726
Visual Question Answering (VQA)onInfiMM-Eval
Overall score· 2023-05-05
22.69
best: 74.44 (GPT-4V)
Otter: A Multi-Modal Model with In-Context Instruction Tuning arXiv:2305.03726

Reasoning3 results

Emotion InterpretationonEIBench (complex)
Recall· 2025-04-10
27.9
best: 39.27 (ChatGPT-4o)
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models arXiv:2504.07521
Emotion InterpretationonEIBench
Recall· 2025-04-10
42.81
best: 63.24 (Claude-3-haiku)
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models arXiv:2504.07521
Visual ReasoningonBongard-OpenWorld
2-Class Accuracy· 2023-10-16
49.3
best: 93.6 (Gemini-2.0 + CA)
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World arXiv:2310.10207