Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
ScanQA Test w/ objects
Visual Question Answering (VQA) on ScanQA Test w/ objects
Metric: BLEU-4 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
BLEU-4
▼
Extra Data
Paper
Date
↕
Code
1
BridgeQA
24.06
No
Bridging the Gap between 2D and 3D Visual Questi...
2024-02-24
Code
2
LLaVA-3D
16.4
No
LLaVA-3D: A Simple yet Effective Pathway to Empo...
2024-09-26
-
3
ChatScene
14.3
No
Chat-Scene: Bridging 3D Scene and Large Language...
2023-12-13
Code
4
Chat-3D v2
14
No
Chat-Scene: Bridging 3D Scene and Large Language...
2023-12-13
Code
5
NaviLLM
13.9
No
Towards Learning a Generalist Model for Embodied...
2023-12-04
Code
6
LL3DA
13.5
No
Visual Instruction Tuning
2023-04-17
Code
7
LEO
13.2
No
An Embodied Generalist Agent in 3D World
2023-11-18
Code
8
ScanQA
12.04
No
ScanQA: 3D Question Answering for Spatial Scene ...
2021-12-20
Code
9
Scene-LLM
12
No
Scene-LLM: Extending Language Model for 3D Visua...
2024-03-18
-
10
3D-LLM (BLIP2-flant5)
11.6
No
3D-LLM: Injecting the 3D World into Large Langua...
2023-07-24
Code
11
3D-LLM (BLIP2-opt)
10.7
No
3D-LLM: Injecting the 3D World into Large Langua...
2023-07-24
Code
12
3D-VisTA
10.4
No
3D-VisTA: Pre-trained Transformer for 3D Vision ...
2023-08-08
Code
13
LLaVA-NeXT-Video
9.8
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
14
VideoChat2
9.6
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
15
3D-LLM (flamingo)
8.4
No
3D-LLM: Injecting the 3D World into Large Langua...
2023-07-24
Code
16
ScanRefer+MCAN
7.46
No
ScanQA: 3D Question Answering for Spatial Scene ...
2021-12-20
Code
17
VoteNet+MCAN
6.08
No
ScanQA: 3D Question Answering for Spatial Scene ...
2021-12-20
Code