Seungho Lee, Hwijeong Lee, Hyunjung Shim
We address the challenges of the semi-supervised LiDAR segmentation (SSLS) problem, particularly in low-budget scenarios. The two main issues in low-budget SSLS are the poor-quality pseudo-labels for unlabeled data, and the performance drops due to the significant imbalance between ground-truth and pseudo-labels. This imbalance leads to a vicious training cycle. To overcome these challenges, we leverage the spatio-temporal prior by recognizing the substantial overlap between temporally adjacent LiDAR scans. We propose a proximity-based label estimation, which generates highly accurate pseudo-labels for unlabeled data by utilizing semantic consistency with adjacent labeled data. Additionally, we enhance this method by progressively expanding the pseudo-labels from the nearest unlabeled scans, which helps significantly reduce errors linked to dynamic classes. Additionally, we employ a dual-branch structure to mitigate performance degradation caused by data imbalance. Experimental results demonstrate remarkable performance in low-budget settings (i.e., <= 5%) and meaningful improvements in normal budget settings (i.e., 5 - 50%). Finally, our method has achieved new state-of-the-art results on SemanticKITTI and nuScenes in semi-supervised LiDAR segmentation. With only 5% labeled data, it offers competitive results against fully-supervised counterparts. Moreover, it surpasses the performance of the previous state-of-the-art at 100% labeled data (75.2%) using only 20% of labeled data (76.0%) on nuScenes. The code is available on https://github.com/halbielee/PLE.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | SemanticKITTI | mIoU (0.5% Labels) | 52.2 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (1% Labels) | 61.1 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (10% Labels) | 63.1 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (2% Labels) | 62.9 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (20% Labels) | 64.1 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (5% Labels) | 62.8 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (50% Labels) | 64.3 | PLE (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (0.5% Labels) | 47.3 | LaserMix (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (2% Labels) | 59.2 | LaserMix (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (5% Labels) | 61.7 | LaserMix (Voxel) |
| Semantic Segmentation | SemanticKITTI | mIoU (0.5% Labels) | 46.2 | PLE (CENet, Range view) |
| Semantic Segmentation | SemanticKITTI | mIoU (1% Labels) | 51.5 | PLE (CENet, Range view) |
| Semantic Segmentation | SemanticKITTI | mIoU (2% Labels) | 54.3 | PLE (CENet, Range view) |
| Semantic Segmentation | SemanticKITTI | mIoU (5% Labels) | 58.1 | PLE (CENet, Range view) |
| Semantic Segmentation | nuScenes | mIoU (0.5% Labels) | 58 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (1% Labels) | 62.9 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (10% Labels) | 74.3 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (2% Labels) | 67.2 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (20% Labels) | 76 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (5% Labels) | 72.8 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (50% Labels) | 76.1 | PLE (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (0.5% Labels) | 51.4 | LaserMix (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (2% Labels) | 63.9 | LaserMix (Voxel) |
| Semantic Segmentation | nuScenes | mIoU (5% Labels) | 69.7 | LaserMix (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (0.5% Labels) | 52.2 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (1% Labels) | 61.1 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (10% Labels) | 63.1 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (2% Labels) | 62.9 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (20% Labels) | 64.1 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (5% Labels) | 62.8 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (50% Labels) | 64.3 | PLE (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (0.5% Labels) | 47.3 | LaserMix (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (2% Labels) | 59.2 | LaserMix (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (5% Labels) | 61.7 | LaserMix (Voxel) |
| 10-shot image generation | SemanticKITTI | mIoU (0.5% Labels) | 46.2 | PLE (CENet, Range view) |
| 10-shot image generation | SemanticKITTI | mIoU (1% Labels) | 51.5 | PLE (CENet, Range view) |
| 10-shot image generation | SemanticKITTI | mIoU (2% Labels) | 54.3 | PLE (CENet, Range view) |
| 10-shot image generation | SemanticKITTI | mIoU (5% Labels) | 58.1 | PLE (CENet, Range view) |
| 10-shot image generation | nuScenes | mIoU (0.5% Labels) | 58 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (1% Labels) | 62.9 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (10% Labels) | 74.3 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (2% Labels) | 67.2 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (20% Labels) | 76 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (5% Labels) | 72.8 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (50% Labels) | 76.1 | PLE (Voxel) |
| 10-shot image generation | nuScenes | mIoU (0.5% Labels) | 51.4 | LaserMix (Voxel) |
| 10-shot image generation | nuScenes | mIoU (2% Labels) | 63.9 | LaserMix (Voxel) |
| 10-shot image generation | nuScenes | mIoU (5% Labels) | 69.7 | LaserMix (Voxel) |