Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
InfiMM-Eval
Visual Question Answering (VQA) on InfiMM-Eval
Metric: Analogical (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Analogical
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4V
69.86
No
GPT-4 Technical Report
2023-03-15
Code
2
Qwen-VL-Chat
30.42
No
Qwen-VL: A Versatile Vision-Language Model for U...
2023-08-24
Code
3
CogVLM-Chat
28.75
No
CogVLM: Visual Expert for Pretrained Language Mo...
2023-11-06
Code
4
LLaVA-1.5
24.31
No
Improved Baselines with Visual Instruction Tuning
2023-10-05
Code
5
LLaMA-Adapter V2
22.08
No
LLaMA-Adapter V2: Parameter-Efficient Visual Ins...
2023-04-28
Code
6
SPHINX v2
20.69
No
SPHINX: The Joint Mixing of Weights, Tasks, and ...
2023-11-13
Code
7
InstructBLIP
20.56
No
InstructBLIP: Towards General-purpose Vision-Lan...
2023-05-11
Code
8
InternLM-XComposer-VL
18.61
No
InternLM-XComposer: A Vision-Language Large Mode...
2023-09-26
Code
9
Emu
18.19
No
Emu: Generative Pretraining in Multimodality
2023-07-11
Code
10
Otter
13.33
No
Otter: A Multi-Modal Model with In-Context Instr...
2023-05-05
Code
11
mPLUG-Owl2
7.64
No
mPLUG-Owl2: Revolutionizing Multi-modal Large La...
2023-11-07
Code
12
BLIP-2-OPT2.7B
7.5
No
BLIP-2: Bootstrapping Language-Image Pre-trainin...
2023-01-30
Code
13
MiniGPT-v2
5.69
No
MiniGPT-4: Enhancing Vision-Language Understandi...
2023-04-20
Code
14
OpenFlamingo-v2
1.11
No
OpenFlamingo: An Open-Source Framework for Train...
2023-08-02
Code