Visual Question Answering on EmbSpatial-Bench

Metric: Generation (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Generation▼	Extra Data	Paper	Date↕	Code
1	SoFar	70.88	No	SoFar: Language-Grounded Orientation Bridges Spa...	2025-02-18	Code
2	Qwen-VL-Max	49.11	No	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
3	GPT-4V	36.07	No	GPT-4 Technical Report	2023-03-15	Code
4	LLaVA-1.6	35.19	No	Visual Instruction Tuning	2023-04-17	Code
5	MiniGPT4	23.54	No	MiniGPT-4: Enhancing Vision-Language Understandi...	2023-04-20	Code

#1SoFarSOTA
70.88
Generation· 2025-02-18
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Code
#2Qwen-VL-MaxSOTA
49.11
Generation· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Code
#3GPT-4VSOTA
36.07
Generation· 2023-03-15
GPT-4 Technical Report Code
#4LLaVA-1.6
35.19
Generation· 2023-04-17
Visual Instruction Tuning Code
#5MiniGPT4
23.54
Generation· 2023-04-20
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Code