TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/State Space Model Meets Transformer: A New Paradigm for 3D...

State Space Model Meets Transformer: A New Paradigm for 3D Object Detection

Chuxin Wang, Wenfei Yang, Xiang Liu, Tianzhu Zhang

2025-03-18International Conference on Learning Representations 2025 1object-detection3D Object DetectionObject Detection
PaperPDFCode

Abstract

DETR-based methods, which use multi-layer transformer decoders to refine object queries iteratively, have shown promising performance in 3D indoor object detection. However, the scene point features in the transformer decoder remain fixed, leading to minimal contributions from later decoder layers, thereby limiting performance improvement. Recently, State Space Models (SSM) have shown efficient context modeling ability with linear complexity through iterative interactions between system states and inputs. Inspired by SSMs, we propose a new 3D object DEtection paradigm with an interactive STate space model (DEST). In the interactive SSM, we design a novel state-dependent SSM parameterization method that enables system states to effectively serve as queries in 3D indoor detection tasks. In addition, we introduce four key designs tailored to the characteristics of point cloud and SSM: The serialization and bidirectional scanning strategies enable bidirectional feature interaction among scene points within the SSM. The inter-state attention mechanism models the relationships between state points, while the gated feed-forward network enhances inter-channel correlations. To the best of our knowledge, this is the first method to model queries as system states and scene points as system inputs, which can simultaneously update scene point features and query features with linear complexity. Extensive experiments on two challenging datasets demonstrate the effectiveness of our DEST-based method. Our method improves the GroupFree baseline in terms of AP50 on ScanNet V2 (+5.3) and SUN RGB-D (+3.2) datasets. Based on the VDETR baseline, Our method sets a new SOTA on the ScanNetV2 and SUN RGB-D datasets.

Results

TaskDatasetMetricValueModel
Object DetectionSUN-RGBD valmAP@0.2569.2DEST (based on V-DETR) (TTA)
Object DetectionSUN-RGBD valmAP@0.552.2DEST (based on V-DETR) (TTA)
Object DetectionSUN-RGBD valmAP@0.2565.3DEST (based on GroupFree3D)
Object DetectionSUN-RGBD valmAP@0.548.4DEST (based on GroupFree3D)
Object DetectionScanNetV2mAP@0.2578.8DEST (based on V-DETR) (TTA)
Object DetectionScanNetV2mAP@0.567.9DEST (based on V-DETR) (TTA)
Object DetectionScanNetV2mAP@0.2571.3DEST (based on GroupFree3D)
Object DetectionScanNetV2mAP@0.558.1DEST (based on GroupFree3D)
3DSUN-RGBD valmAP@0.2569.2DEST (based on V-DETR) (TTA)
3DSUN-RGBD valmAP@0.552.2DEST (based on V-DETR) (TTA)
3DSUN-RGBD valmAP@0.2565.3DEST (based on GroupFree3D)
3DSUN-RGBD valmAP@0.548.4DEST (based on GroupFree3D)
3DScanNetV2mAP@0.2578.8DEST (based on V-DETR) (TTA)
3DScanNetV2mAP@0.567.9DEST (based on V-DETR) (TTA)
3DScanNetV2mAP@0.2571.3DEST (based on GroupFree3D)
3DScanNetV2mAP@0.558.1DEST (based on GroupFree3D)
3D Object DetectionSUN-RGBD valmAP@0.2569.2DEST (based on V-DETR) (TTA)
3D Object DetectionSUN-RGBD valmAP@0.552.2DEST (based on V-DETR) (TTA)
3D Object DetectionSUN-RGBD valmAP@0.2565.3DEST (based on GroupFree3D)
3D Object DetectionSUN-RGBD valmAP@0.548.4DEST (based on GroupFree3D)
3D Object DetectionScanNetV2mAP@0.2578.8DEST (based on V-DETR) (TTA)
3D Object DetectionScanNetV2mAP@0.567.9DEST (based on V-DETR) (TTA)
3D Object DetectionScanNetV2mAP@0.2571.3DEST (based on GroupFree3D)
3D Object DetectionScanNetV2mAP@0.558.1DEST (based on GroupFree3D)
2D ClassificationSUN-RGBD valmAP@0.2569.2DEST (based on V-DETR) (TTA)
2D ClassificationSUN-RGBD valmAP@0.552.2DEST (based on V-DETR) (TTA)
2D ClassificationSUN-RGBD valmAP@0.2565.3DEST (based on GroupFree3D)
2D ClassificationSUN-RGBD valmAP@0.548.4DEST (based on GroupFree3D)
2D ClassificationScanNetV2mAP@0.2578.8DEST (based on V-DETR) (TTA)
2D ClassificationScanNetV2mAP@0.567.9DEST (based on V-DETR) (TTA)
2D ClassificationScanNetV2mAP@0.2571.3DEST (based on GroupFree3D)
2D ClassificationScanNetV2mAP@0.558.1DEST (based on GroupFree3D)
2D Object DetectionSUN-RGBD valmAP@0.2569.2DEST (based on V-DETR) (TTA)
2D Object DetectionSUN-RGBD valmAP@0.552.2DEST (based on V-DETR) (TTA)
2D Object DetectionSUN-RGBD valmAP@0.2565.3DEST (based on GroupFree3D)
2D Object DetectionSUN-RGBD valmAP@0.548.4DEST (based on GroupFree3D)
2D Object DetectionScanNetV2mAP@0.2578.8DEST (based on V-DETR) (TTA)
2D Object DetectionScanNetV2mAP@0.567.9DEST (based on V-DETR) (TTA)
2D Object DetectionScanNetV2mAP@0.2571.3DEST (based on GroupFree3D)
2D Object DetectionScanNetV2mAP@0.558.1DEST (based on GroupFree3D)
16kSUN-RGBD valmAP@0.2569.2DEST (based on V-DETR) (TTA)
16kSUN-RGBD valmAP@0.552.2DEST (based on V-DETR) (TTA)
16kSUN-RGBD valmAP@0.2565.3DEST (based on GroupFree3D)
16kSUN-RGBD valmAP@0.548.4DEST (based on GroupFree3D)
16kScanNetV2mAP@0.2578.8DEST (based on V-DETR) (TTA)
16kScanNetV2mAP@0.567.9DEST (based on V-DETR) (TTA)
16kScanNetV2mAP@0.2571.3DEST (based on GroupFree3D)
16kScanNetV2mAP@0.558.1DEST (based on GroupFree3D)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07