Visual Question Answering (VQA) on 6-DoF SpatialBench

Metric: Total (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Total▼	Extra Data	Paper	Date↕	Code
1	SoFar	43.9	No	SoFar: Language-Grounded Orientation Bridges Spa...	2025-02-18	Code
2	GPT-4o	36.2	No	GPT-4o System Card	2024-10-25	-
3	RoboPoint	33.5	No	RoboPoint: A Vision-Language Model for Spatial A...	2024-06-15	-
4	SpatialBot	32.7	No	SpatialBot: Precise Spatial Understanding with V...	2024-06-19	Code
5	SpaceMantis	28.9	No	SpatialVLM: Endowing Vision-Language Models with...	2024-01-22	-
6	SpaceLLaVA	28.2	No	SpatialVLM: Endowing Vision-Language Models with...	2024-01-22	-
7	LLaVA-1.5	27.2	No	Improved Baselines with Visual Instruction Tuning	2023-10-05	Code

#1SoFarSOTA
43.9
Total· 2025-02-18
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Code
#2GPT-4oSOTA
36.2
Total· 2024-10-25
GPT-4o System Card
#3RoboPointSOTA
33.5
Total· 2024-06-15
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
#4SpatialBot
32.7
Total· 2024-06-19
SpatialBot: Precise Spatial Understanding with Vision Language Models Code
#5SpaceMantisSOTA
28.9
Total· 2024-01-22
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
#6SpaceLLaVA
28.2
Total· 2024-01-22
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
#7LLaVA-1.5SOTA
27.2
Total· 2023-10-05
Improved Baselines with Visual Instruction Tuning Code