TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/4D Spatio-Temporal ConvNets: Minkowski Convolutional Neura...

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Christopher Choy, JunYoung Gwak, Silvio Savarese

2019-04-18CVPR 2019 6Robust 3D Semantic SegmentationSemantic Segmentation3D Semantic Segmentation
PaperPDFCodeCodeCodeCodeCodeCode(official)CodeCode

Abstract

In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors and propose the generalized sparse convolution that encompasses all discrete convolutions. To implement the generalized sparse convolution, we create an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks. We create 4D spatio-temporal convolutional neural networks using the library and validate them on various 3D semantic segmentation benchmarks and proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space. Experimentally, we show that convolutional neural networks with only generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by a large margin. Also, we show that on 3D-videos, 4D spatio-temporal convolutional neural networks are robust to noise, outperform 3D convolutional neural networks and are faster than the 3D counterpart in some cases.

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNettest mIoU73.4MinkowskiNet
Semantic SegmentationScanNetval mIoU72.2MinkowskiNet
Semantic SegmentationS3DIS Area5mAcc71.7MinkowskiNet
Semantic SegmentationS3DIS Area5mIoU65.4MinkowskiNet
Semantic SegmentationS3DISMean IoU65.4MinkowskiNet
Semantic SegmentationS3DISParams (M)37.9MinkowskiNet
Semantic SegmentationScanNet200test mIoU25.3MinkUNet
Semantic SegmentationScanNet200val mIoU25MinkUNet
Semantic SegmentationSTPLS3DmIOU51.3MinkowskiNet
Semantic SegmentationWildScenesmIoU36.53MinkUNet
Semantic SegmentationWildScenesmIoU (Env DA)30.78MinkUNet
Semantic SegmentationWildScenesmIoU (Temporal DA)27.2MinkUNet
Semantic SegmentationScanNet++Top-1 IoU0.456SpUNet (MinkowskiNet)
Semantic SegmentationScanNet++Top-3 IoU0.683SpUNet (MinkowskiNet)
Semantic SegmentationScribbleKITTImIoU55MinkowskiNet
3D Semantic SegmentationScanNet200test mIoU25.3MinkUNet
3D Semantic SegmentationScanNet200val mIoU25MinkUNet
3D Semantic SegmentationSTPLS3DmIOU51.3MinkowskiNet
3D Semantic SegmentationWildScenesmIoU36.53MinkUNet
3D Semantic SegmentationWildScenesmIoU (Env DA)30.78MinkUNet
3D Semantic SegmentationWildScenesmIoU (Temporal DA)27.2MinkUNet
3D Semantic SegmentationScanNet++Top-1 IoU0.456SpUNet (MinkowskiNet)
3D Semantic SegmentationScanNet++Top-3 IoU0.683SpUNet (MinkowskiNet)
3D Semantic SegmentationScribbleKITTImIoU55MinkowskiNet
10-shot image generationScanNettest mIoU73.4MinkowskiNet
10-shot image generationScanNetval mIoU72.2MinkowskiNet
10-shot image generationS3DIS Area5mAcc71.7MinkowskiNet
10-shot image generationS3DIS Area5mIoU65.4MinkowskiNet
10-shot image generationS3DISMean IoU65.4MinkowskiNet
10-shot image generationS3DISParams (M)37.9MinkowskiNet
10-shot image generationScanNet200test mIoU25.3MinkUNet
10-shot image generationScanNet200val mIoU25MinkUNet
10-shot image generationSTPLS3DmIOU51.3MinkowskiNet
10-shot image generationWildScenesmIoU36.53MinkUNet
10-shot image generationWildScenesmIoU (Env DA)30.78MinkUNet
10-shot image generationWildScenesmIoU (Temporal DA)27.2MinkUNet
10-shot image generationScanNet++Top-1 IoU0.456SpUNet (MinkowskiNet)
10-shot image generationScanNet++Top-3 IoU0.683SpUNet (MinkowskiNet)
10-shot image generationScribbleKITTImIoU55MinkowskiNet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15