TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Single Frame Semantic Segmentation Using Multi-Modal Spher...

Single Frame Semantic Segmentation Using Multi-Modal Spherical Images

Suresh Guttikonda, Jason Rambach

2023-08-18Semantic Segmentation
PaperPDFCode(official)

Abstract

In recent years, the research community has shown a lot of interest to panoramic images that offer a 360-degree directional perspective. Multiple data modalities can be fed, and complimentary characteristics can be utilized for more robust and rich scene interpretation based on semantic segmentation, to fully realize the potential. Existing research, however, mostly concentrated on pinhole RGB-X semantic segmentation. In this study, we propose a transformer-based cross-modal fusion architecture to bridge the gap between multi-modal fusion and omnidirectional scene perception. We employ distortion-aware modules to address extreme object deformations and panorama distortions that result from equirectangular representation. Additionally, we conduct cross-modal interactions for feature rectification and information exchange before merging the features in order to communicate long-range contexts for bi-modal and tri-modal feature streams. In thorough tests using combinations of four different modality types in three indoor panoramic-view datasets, our technique achieved state-of-the-art mIoU performance: 60.60% on Stanford2D3DS (RGB-HHA), 71.97% Structured3D (RGB-D-N), and 35.92% Matterport3D (RGB-D). We plan to release all codes and trained models soon.

Results

TaskDatasetMetricValueModel
Semantic SegmentationStructured3DTest mIoU71.97SFSS-MMSI (RGB+Depth+Normal)
Semantic SegmentationStructured3DValidation mIoU75.86SFSS-MMSI (RGB+Depth+Normal)
Semantic SegmentationStructured3DTest mIoU71SFSS-MMSI (RGB+Normal)
Semantic SegmentationStructured3DValidation mIoU74.38SFSS-MMSI (RGB+Normal)
Semantic SegmentationStructured3DTest mIoU70.17SFSS-MMSI (RGB+Depth)
Semantic SegmentationStructured3DValidation mIoU73.78SFSS-MMSI (RGB+Depth)
Semantic SegmentationStructured3DTest mIoU68.34SFSS-MMSI (RGB Only)
Semantic SegmentationStructured3DValidation mIoU71.94SFSS-MMSI (RGB Only)
Semantic SegmentationStanford2D3D PanoramicmAcc70.68SFSS-MMSI (RGB+HHA)
Semantic SegmentationStanford2D3D PanoramicmAcc69.03SFSS-MMSI (RGB+Depth+Normal)
Semantic SegmentationStanford2D3D PanoramicmAcc68.79SFSS-MMSI (RGB+Normal)
Semantic SegmentationStanford2D3D PanoramicmAcc68.57SFSS-MMSI (RGB+Depth)
Semantic SegmentationStanford2D3D PanoramicmAcc63.96SFSS-MMSI (RGB Only)
Semantic SegmentationMatterport3DTest mIoU35.92SFSS-MMSI (RGB+Depth)
Semantic SegmentationMatterport3DValidation mIoU39.19SFSS-MMSI (RGB+Depth)
Semantic SegmentationMatterport3DTest mIoU35.77SFSS-MMSI (RGB+Normal)
Semantic SegmentationMatterport3DValidation mIoU38.91SFSS-MMSI (RGB+Normal)
Semantic SegmentationMatterport3DTest mIoU35.52SFSS-MMSI (RGB+Depth+Normal)
Semantic SegmentationMatterport3DValidation mIoU39.26SFSS-MMSI (RGB+Depth+Normal)
Semantic SegmentationMatterport3DTest mIoU31.3SFSS-MMSI (RGB Only)
Semantic SegmentationMatterport3DValidation mIoU35.15SFSS-MMSI (RGB Only)
10-shot image generationStructured3DTest mIoU71.97SFSS-MMSI (RGB+Depth+Normal)
10-shot image generationStructured3DValidation mIoU75.86SFSS-MMSI (RGB+Depth+Normal)
10-shot image generationStructured3DTest mIoU71SFSS-MMSI (RGB+Normal)
10-shot image generationStructured3DValidation mIoU74.38SFSS-MMSI (RGB+Normal)
10-shot image generationStructured3DTest mIoU70.17SFSS-MMSI (RGB+Depth)
10-shot image generationStructured3DValidation mIoU73.78SFSS-MMSI (RGB+Depth)
10-shot image generationStructured3DTest mIoU68.34SFSS-MMSI (RGB Only)
10-shot image generationStructured3DValidation mIoU71.94SFSS-MMSI (RGB Only)
10-shot image generationStanford2D3D PanoramicmAcc70.68SFSS-MMSI (RGB+HHA)
10-shot image generationStanford2D3D PanoramicmAcc69.03SFSS-MMSI (RGB+Depth+Normal)
10-shot image generationStanford2D3D PanoramicmAcc68.79SFSS-MMSI (RGB+Normal)
10-shot image generationStanford2D3D PanoramicmAcc68.57SFSS-MMSI (RGB+Depth)
10-shot image generationStanford2D3D PanoramicmAcc63.96SFSS-MMSI (RGB Only)
10-shot image generationMatterport3DTest mIoU35.92SFSS-MMSI (RGB+Depth)
10-shot image generationMatterport3DValidation mIoU39.19SFSS-MMSI (RGB+Depth)
10-shot image generationMatterport3DTest mIoU35.77SFSS-MMSI (RGB+Normal)
10-shot image generationMatterport3DValidation mIoU38.91SFSS-MMSI (RGB+Normal)
10-shot image generationMatterport3DTest mIoU35.52SFSS-MMSI (RGB+Depth+Normal)
10-shot image generationMatterport3DValidation mIoU39.26SFSS-MMSI (RGB+Depth+Normal)
10-shot image generationMatterport3DTest mIoU31.3SFSS-MMSI (RGB Only)
10-shot image generationMatterport3DValidation mIoU35.15SFSS-MMSI (RGB Only)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15