Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie zhou, Jiwen Lu

2023-02-15CVPR 2023 1Prediction Of Occupancy Grid Maps Autonomous Driving 3D Semantic Scene Completion

Abstract

Modern methods for vision-centric autonomous driving perception widely adopt the bird's-eye-view (BEV) representation to describe a 3D scene. Despite its better efficiency than voxel representation, it has difficulty describing the fine-grained 3D structure of a scene with a single plane. To address this, we propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes. We model each point in the 3D space by summing its projected features on the three planes. To lift image features to the 3D TPV space, we further propose a transformer-based TPV encoder (TPVFormer) to obtain the TPV features effectively. We employ the attention mechanism to aggregate the image features corresponding to each query in each TPV plane. Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels. We demonstrate for the first time that using only camera inputs can achieve comparable performance with LiDAR-based methods on the LiDAR segmentation task on nuScenes. Code: https://github.com/wzzheng/TPVFormer.

Results

Task	Dataset	Metric	Value	Model
3D Reconstruction	KITTI-360	mIoU	13.64	TPVFormer
Prediction Of Occupancy Grid Maps	nuScenes	mIoU	52.058	TPVFormer04
3D	KITTI-360	mIoU	13.64	TPVFormer
3D Semantic Scene Completion	KITTI-360	mIoU	13.64	TPVFormer

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19 AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17 Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17 Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17 Safeguarding Federated Learning-based Road Condition Classification2025-07-16 Towards Autonomous Riding: A Review of Perception, Planning, and Control in Intelligent Two-Wheelers2025-07-16