EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

Julian Straub, Daniel DeTone, Tianwei Shen, Nan Yang, Chris Sweeney, Richard Newcombe

2024-06-14Multi-View 3D Reconstruction 3D Reconstruction object-detection 3D Object Detection Object Detection

Abstract

The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D, a benchmark with two core 3D egocentric perception tasks. EFM3D is the first benchmark for 3D object detection and surface regression on high quality annotated egocentric data of Project Aria. We propose Egocentric Voxel Lifting (EVL), a baseline for 3D EFMs. EVL leverages all available egocentric modalities and inherits foundational capabilities from 2D foundation models. This model, trained on a large simulated dataset, outperforms existing methods on the EFM3D benchmark.

Results

Task	Dataset	Metric	Value	Model
3D Reconstruction	Aria Synthetic Environments	Accuracy	5.7	EVL
3D Reconstruction	Aria Synthetic Environments	Completeness	87.7	EVL
3D Reconstruction	Aria Synthetic Environments	Precision	82.2	EVL
3D Reconstruction	Aria Synthetic Environments	Recall	10.6	EVL
3D Reconstruction	Aria Digital Twin Dataset	Accuracy	18.2	EVL
3D Reconstruction	Aria Digital Twin Dataset	Completeness	3.105	EVL
3D Reconstruction	Aria Digital Twin Dataset	Precision	59.4	EVL
Object Detection	Aria Everyday Objects	mAP	22	EVL
Object Detection	Aria Everyday Objects	mAP	16	3DETR
Object Detection	Aria Everyday Objects	mAP	15	ImVoxelNet
Object Detection	Aria Everyday Objects	mAP	8	Cube R-CNN
Object Detection	Aria Synthetic Environments	MAP	75	EVL
Object Detection	Aria Synthetic Environments	MAP	64	ImVoxelNet
Object Detection	Aria Synthetic Environments	MAP	36	Cube R-CNN
Object Detection	Aria Synthetic Environments	MAP	33	3DETR
3D	Aria Everyday Objects	mAP	22	EVL
3D	Aria Everyday Objects	mAP	16	3DETR
3D	Aria Everyday Objects	mAP	15	ImVoxelNet
3D	Aria Everyday Objects	mAP	8	Cube R-CNN
3D	Aria Synthetic Environments	MAP	75	EVL
3D	Aria Synthetic Environments	MAP	64	ImVoxelNet
3D	Aria Synthetic Environments	MAP	36	Cube R-CNN
3D	Aria Synthetic Environments	MAP	33	3DETR
3D	Aria Synthetic Environments	Accuracy	5.7	EVL
3D	Aria Synthetic Environments	Completeness	87.7	EVL
3D	Aria Synthetic Environments	Precision	82.2	EVL
3D	Aria Synthetic Environments	Recall	10.6	EVL
3D	Aria Digital Twin Dataset	Accuracy	18.2	EVL
3D	Aria Digital Twin Dataset	Completeness	3.105	EVL
3D	Aria Digital Twin Dataset	Precision	59.4	EVL
3D Object Detection	Aria Everyday Objects	mAP	22	EVL
3D Object Detection	Aria Everyday Objects	mAP	16	3DETR
3D Object Detection	Aria Everyday Objects	mAP	15	ImVoxelNet
3D Object Detection	Aria Everyday Objects	mAP	8	Cube R-CNN
3D Object Detection	Aria Synthetic Environments	MAP	75	EVL
3D Object Detection	Aria Synthetic Environments	MAP	64	ImVoxelNet
3D Object Detection	Aria Synthetic Environments	MAP	36	Cube R-CNN
3D Object Detection	Aria Synthetic Environments	MAP	33	3DETR
2D Classification	Aria Everyday Objects	mAP	22	EVL
2D Classification	Aria Everyday Objects	mAP	16	3DETR
2D Classification	Aria Everyday Objects	mAP	15	ImVoxelNet
2D Classification	Aria Everyday Objects	mAP	8	Cube R-CNN
2D Classification	Aria Synthetic Environments	MAP	75	EVL
2D Classification	Aria Synthetic Environments	MAP	64	ImVoxelNet
2D Classification	Aria Synthetic Environments	MAP	36	Cube R-CNN
2D Classification	Aria Synthetic Environments	MAP	33	3DETR
2D Object Detection	Aria Everyday Objects	mAP	22	EVL
2D Object Detection	Aria Everyday Objects	mAP	16	3DETR
2D Object Detection	Aria Everyday Objects	mAP	15	ImVoxelNet
2D Object Detection	Aria Everyday Objects	mAP	8	Cube R-CNN
2D Object Detection	Aria Synthetic Environments	MAP	75	EVL
2D Object Detection	Aria Synthetic Environments	MAP	64	ImVoxelNet
2D Object Detection	Aria Synthetic Environments	MAP	36	Cube R-CNN
2D Object Detection	Aria Synthetic Environments	MAP	33	3DETR
16k	Aria Everyday Objects	mAP	22	EVL
16k	Aria Everyday Objects	mAP	16	3DETR
16k	Aria Everyday Objects	mAP	15	ImVoxelNet
16k	Aria Everyday Objects	mAP	8	Cube R-CNN
16k	Aria Synthetic Environments	MAP	75	EVL
16k	Aria Synthetic Environments	MAP	64	ImVoxelNet
16k	Aria Synthetic Environments	MAP	36	Cube R-CNN
16k	Aria Synthetic Environments	MAP	33	3DETR

EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

Abstract

Results

Related Papers

EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

Abstract

Results

Related Papers