TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/SQA3D

Visual Question Answering (VQA) on SQA3D

Metric: Exact Match (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Exact Match▼Extra DataPaperDate↕Code
1LLaVA-3D60.1NoLLaVA-3D: A Simple yet Effective Pathway to Empo...2024-09-26-
2Video-3D LLM58.6NoVideo-3D LLM: Learning Position-Aware Video Repr...2024-11-30Code
3Chat-3D v254.7NoChat-Scene: Bridging 3D Scene and Large Language...2023-12-13Code
4ChatScene54.6NoChat-Scene: Bridging 3D Scene and Large Language...2023-12-13Code
5Scene-LLM54.2NoScene-LLM: Extending Language Model for 3D Visua...2024-03-18-
6LEO50NoAn Embodied Generalist Agent in 3D World2023-11-18Code
7LLaVA-Video48.5NoVideo Instruction Tuning With Synthetic Data2024-10-03-
83D-VisTA48.5No3D-VisTA: Pre-trained Transformer for 3D Vision ...2023-08-08Code
9ScanQA47.2NoScanQA: 3D Question Answering for Spatial Scene ...2021-12-20Code
10PQ3D47.1NoUnifying 3D Vision-Language Understanding via Pr...2024-05-19-
11Scan2Cap41NoScan2Cap: Context-aware Dense Captioning in RGB-...2020-12-03-
12VideoChat237.3NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
13LLaVA-NeXT-Video34.2NoLLaVA-OneVision: Easy Visual Task Transfer2024-08-06Code