4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding

Yujin Chen, Matthias Nießner, Angela Dai

2021-12-06Unsupervised Pre-training 3D Instance Segmentation Representation Learning Data Augmentation Scene Understanding Segmentation Semantic Segmentation Contrastive Learning Instance Segmentation 3D Semantic Segmentation object-detection Object Detection

Paper PDF

Abstract

We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training. We observe that dynamic movement of an object through an environment provides important cues about its objectness, and thus propose to imbue learned 3D representations with such dynamic understanding, that can then be effectively transferred to improved performance in downstream 3D semantic scene understanding tasks. We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments, and employ contrastive learning under 3D-4D constraints that encode 4D invariances into the learned 3D representations. Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks, and moreover, notably improves performance in data-scarce scenarios.

Results

Task	Dataset	Metric	Value	Model
Instance Segmentation	ScanNet(v2)	mAP @ 50	57.6	4DContrast
3D Instance Segmentation	ScanNet(v2)	mAP @ 50	57.6	4DContrast

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17