TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/ScanQA Test w/ objects

Visual Question Answering (VQA) on ScanQA Test w/ objects

Metric: CIDEr (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕CIDEr▼Extra DataPaperDate↕Code
1LLaVA-3D103.1NoLLaVA-3D: A Simple yet Effective Pathway to Empo...2024-09-26-
2Video-3D LLM102.1NoVideo-3D LLM: Learning Position-Aware Video Repr...2024-11-30Code
3LEO101.4NoAn Embodied Generalist Agent in 3D World2023-11-18Code
4ChatScene87.7NoChat-Scene: Bridging 3D Scene and Large Language...2023-12-13Code
5Chat-3D v287.6NoChat-Scene: Bridging 3D Scene and Large Language...2023-12-13Code
6BridgeQA83.75NoBridging the Gap between 2D and 3D Visual Questi...2024-02-24Code
7NaviLLM80.77NoTowards Learning a Generalist Model for Embodied...2023-12-04Code
8Scene-LLM80NoScene-LLM: Extending Language Model for 3D Visua...2024-03-18-
9LL3DA76.8NoVisual Instruction Tuning2023-04-17Code
103D-VisTA69.6No3D-VisTA: Pre-trained Transformer for 3D Vision ...2023-08-08Code
113D-LLM (BLIP2-flant5)69.6No3D-LLM: Injecting the 3D World into Large Langua...2023-07-24Code
12ScanQA67.29NoScanQA: 3D Question Answering for Spatial Scene ...2021-12-20Code
133D-LLM (BLIP2-opt)67.1No3D-LLM: Injecting the 3D World into Large Langua...2023-07-24Code
143D-LLM (flamingo)65.6No3D-LLM: Injecting the 3D World into Large Langua...2023-07-24Code
15VoteNet+MCAN58.23NoScanQA: 3D Question Answering for Spatial Scene ...2021-12-20Code
16ScanRefer+MCAN57.56NoScanQA: 3D Question Answering for Spatial Scene ...2021-12-20Code
17VideoChat249.2NoMVBench: A Comprehensive Multi-modal Video Under...2023-11-28Code
18LLaVA-NeXT-Video46.2NoLLaVA-OneVision: Easy Visual Task Transfer2024-08-06Code