TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MVTN: Multi-View Transformation Network for 3D Shape Recog...

MVTN: Multi-View Transformation Network for 3D Shape Recognition

Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

2020-11-26ICCV 2021 103D ClassificationMulti-View 3D Shape Retrieval3D Shape RetrievalRetrieval3D Shape Recognition3D Shape Classification3D Point Cloud Classification3D Object Retrieval
PaperPDFCodeCode(official)

Abstract

Multi-view projection methods have demonstrated their ability to reach state-of-the-art performance on 3D shape recognition. Those methods learn different ways to aggregate information from multiple views. However, the camera view-points for those views tend to be heuristically set and fixed for all shapes. To circumvent the lack of dynamism of current multi-view methods, we propose to learn those view-points. In particular, we introduce the Multi-View Transformation Network (MVTN) that regresses optimal view-points for 3D shape recognition, building upon advances in differentiable rendering. As a result, MVTN can be trained end-to-end along with any multi-view network for 3D shape classification. We integrate MVTN in a novel adaptive multi-view pipeline that can render either 3D meshes or point clouds. MVTN exhibits clear performance gains in the tasks of 3D shape classification and 3D shape retrieval without the need for extra training supervision. In these tasks, MVTN achieves state-of-the-art performance on ModelNet40, ShapeNet Core55, and the most recent and realistic ScanObjectNN dataset (up to 6% improvement). Interestingly, we also show that MVTN can provide network robustness against rotation and occlusion in the 3D domain. The code is available at https://github.com/ajhamdi/MVTN .

Results

TaskDatasetMetricValueModel
3DShapeNetCore 55Mean AP82.9MVTN
3DModelNet40Mean AP92.9MVTN
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-BG (OA)92.6MVTN
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-ONLY (OA)92.3MVTN
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy82.8MVTN
Shape Representation Of 3D Point CloudsModelNet40Mean Accuracy92.2MVTN
Shape Representation Of 3D Point CloudsModelNet40Overall Accuracy93.8MVTN
3D Point Cloud ClassificationScanObjectNNOBJ-BG (OA)92.6MVTN
3D Point Cloud ClassificationScanObjectNNOBJ-ONLY (OA)92.3MVTN
3D Point Cloud ClassificationScanObjectNNOverall Accuracy82.8MVTN
3D Point Cloud ClassificationModelNet40Mean Accuracy92.2MVTN
3D Point Cloud ClassificationModelNet40Overall Accuracy93.8MVTN
3D Point Cloud ReconstructionScanObjectNNOBJ-BG (OA)92.6MVTN
3D Point Cloud ReconstructionScanObjectNNOBJ-ONLY (OA)92.3MVTN
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy82.8MVTN
3D Point Cloud ReconstructionModelNet40Mean Accuracy92.2MVTN
3D Point Cloud ReconstructionModelNet40Overall Accuracy93.8MVTN

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15