ScanQA

ScanQA: 3D Question Answering for Spatial Scene Understanding

Introduced 2021-12-20

We collected 41,363 questions and 58,191 answers, in- cluding 32,337 unique questions and 16,999 unique an- swers. Table 2 presents the statistics of the ScanQA dataset. This dataset is an order of magnitude larger than existing embodied question-answering datasets in terms of both question size and variation. For example, the EQA dataset contains 4,246 questions, consisting of 147 unique questions in its training set. The EQA-MP3D dataset contains 767 questions consisting of 174 unique questions in its training set. Considering that our dataset contains not only question–answer pairs but also 3D object localization annotations, we assume that this is the largest dataset to specify the nature of objects in 3D scenes with the question answering form. The distribution of the questions based on their first word. We collected various types of questions through question auto-generation and editing by humans.