TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ImGeoNet: Image-induced Geometry-aware Voxel Representatio...

ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection

Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun

2023-08-17ICCV 2023 1object-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

We propose ImGeoNet, a multi-view image-based 3D object detection framework that models a 3D space by an image-induced geometry-aware voxel representation. Unlike previous methods which aggregate 2D features into 3D voxels without considering geometry, ImGeoNet learns to induce geometry from multi-view images to alleviate the confusion arising from voxels of free space, and during the inference phase, only images from multiple views are required. Besides, a powerful pre-trained 2D feature extractor can be leveraged by our representation, leading to a more robust performance. To evaluate the effectiveness of ImGeoNet, we conduct quantitative and qualitative experiments on three indoor datasets, namely ARKitScenes, ScanNetV2, and ScanNet200. The results demonstrate that ImGeoNet outperforms the current state-of-the-art multi-view image-based method, ImVoxelNet, on all three datasets in terms of detection accuracy. In addition, ImGeoNet shows great data efficiency by achieving results comparable to ImVoxelNet with 100 views while utilizing only 40 views. Furthermore, our studies indicate that our proposed image-induced geometry-aware representation can enable image-based methods to attain superior detection accuracy than the seminal point cloud-based method, VoteNet, in two practical scenarios: (1) scenarios where point clouds are sparse and noisy, such as in ARKitScenes, and (2) scenarios involve diverse object classes, particularly classes of small objects, as in the case in ScanNet200.

Results

TaskDatasetMetricValueModel
Object DetectionScanNetV2mAP@0.2554.8ImGeoNet (RGB only)
Object DetectionScanNetV2mAP@0.528.4ImGeoNet (RGB only)
3DScanNetV2mAP@0.2554.8ImGeoNet (RGB only)
3DScanNetV2mAP@0.528.4ImGeoNet (RGB only)
3D Object DetectionScanNetV2mAP@0.2554.8ImGeoNet (RGB only)
3D Object DetectionScanNetV2mAP@0.528.4ImGeoNet (RGB only)
2D ClassificationScanNetV2mAP@0.2554.8ImGeoNet (RGB only)
2D ClassificationScanNetV2mAP@0.528.4ImGeoNet (RGB only)
2D Object DetectionScanNetV2mAP@0.2554.8ImGeoNet (RGB only)
2D Object DetectionScanNetV2mAP@0.528.4ImGeoNet (RGB only)
16kScanNetV2mAP@0.2554.8ImGeoNet (RGB only)
16kScanNetV2mAP@0.528.4ImGeoNet (RGB only)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07