SQA3D

Situated Question Answering in 3D Scenes

3DImagesTextsVideosCC-BY-4.0Introduced 2023-01-30

SQA3D is a dataset for embodied scene understanding, where an agent needs to comprehend the scene it situates from an first person's perspective and answer questions. The questions are designed to be situated, embodied and knowledge-intensive. We offer three different modalities to represent a 3D scene: 3D scan, egocentric video and BEV picture.