Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Visual Question Answering (VQA)
/
SQA3D
Visual Question Answering (VQA) on SQA3D
Metric: Exact Match (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Exact Match (best first)
Exact Match (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Exact Match
▼
Extra Data
Paper
Date
↕
Code
1
LLaVA-3D
60.1
No
LLaVA-3D: A Simple yet Effective Pathway to Empo...
2024-09-26
-
2
Video-3D LLM
58.6
No
Video-3D LLM: Learning Position-Aware Video Repr...
2024-11-30
Code
3
Chat-3D v2
54.7
No
Chat-Scene: Bridging 3D Scene and Large Language...
2023-12-13
Code
4
ChatScene
54.6
No
Chat-Scene: Bridging 3D Scene and Large Language...
2023-12-13
Code
5
Scene-LLM
54.2
No
Scene-LLM: Extending Language Model for 3D Visua...
2024-03-18
-
6
LEO
50
No
An Embodied Generalist Agent in 3D World
2023-11-18
Code
7
LLaVA-Video
48.5
No
Video Instruction Tuning With Synthetic Data
2024-10-03
-
8
3D-VisTA
48.5
No
3D-VisTA: Pre-trained Transformer for 3D Vision ...
2023-08-08
Code
9
ScanQA
47.2
No
ScanQA: 3D Question Answering for Spatial Scene ...
2021-12-20
Code
10
PQ3D
47.1
No
Unifying 3D Vision-Language Understanding via Pr...
2024-05-19
-
11
Scan2Cap
41
No
Scan2Cap: Context-aware Dense Captioning in RGB-...
2020-12-03
-
12
VideoChat2
37.3
No
MVBench: A Comprehensive Multi-modal Video Under...
2023-11-28
Code
13
LLaVA-NeXT-Video
34.2
No
LLaVA-OneVision: Easy Visual Task Transfer
2024-08-06
Code
#1
LLaVA-3D
SOTA
60.1
Exact Match
· 2024-09-26
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
#2
Video-3D LLM
58.6
Exact Match
· 2024-11-30
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
Code
#3
Chat-3D v2
SOTA
54.7
Exact Match
· 2023-12-13
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
Code
#4
ChatScene
54.6
Exact Match
· 2023-12-13
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
Code
#5
Scene-LLM
54.2
Exact Match
· 2024-03-18
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
#6
LEO
SOTA
50
Exact Match
· 2023-11-18
An Embodied Generalist Agent in 3D World
Code
#7
LLaVA-Video
48.5
Exact Match
· 2024-10-03
Video Instruction Tuning With Synthetic Data
#8
3D-VisTA
SOTA
48.5
Exact Match
· 2023-08-08
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
Code
#9
ScanQA
SOTA
47.2
Exact Match
· 2021-12-20
ScanQA: 3D Question Answering for Spatial Scene Understanding
Code
#10
PQ3D
47.1
Exact Match
· 2024-05-19
Unifying 3D Vision-Language Understanding via Promptable Queries
#11
Scan2Cap
SOTA
41
Exact Match
· 2020-12-03
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
#12
VideoChat2
37.3
Exact Match
· 2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Code
#13
LLaVA-NeXT-Video
34.2
Exact Match
· 2024-08-06
LLaVA-OneVision: Easy Visual Task Transfer
Code