Lojze Žust, Janez Perš, Matej Kristan
The progress in maritime obstacle detection is hindered by the lack of a diverse dataset that adequately captures the complexity of general maritime environments. We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, obstacle classes, and acquisition conditions among the related datasets. LaRS is composed of over 4000 per-pixel labeled key frames with nine preceding frames to allow utilization of the temporal texture, amounting to over 40k frames. Each key frame is annotated with 8 thing, 3 stuff classes and 19 global scene attributes. We report the results of 27 semantic and panoptic segmentation methods, along with several performance insights and future research directions. To enable objective evaluation, we have implemented an online evaluation server. The LaRS dataset, evaluation toolkit and benchmark are publicly available at: https://lojzezust.github.io/lars-dataset
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Scene Parsing | LaRS | F1 | 62.1 | WaSR-T (ResNet-101) |
| Scene Parsing | LaRS | Q | 60.1 | WaSR-T (ResNet-101) |
| Scene Parsing | LaRS | mIoU | 96.7 | WaSR-T (ResNet-101) |
| Scene Parsing | LaRS | μ | 71.1 | WaSR-T (ResNet-101) |
| Scene Parsing | LaRS | F1 | 61.1 | TMANet (ResNet-50) |
| Scene Parsing | LaRS | Q | 57.5 | TMANet (ResNet-50) |
| Scene Parsing | LaRS | mIoU | 94.1 | TMANet (ResNet-50) |
| Scene Parsing | LaRS | μ | 77.1 | TMANet (ResNet-50) |
| Scene Parsing | LaRS | F1 | 52.1 | CSANet (ResNet-101) |
| Scene Parsing | LaRS | Q | 49.1 | CSANet (ResNet-101) |
| Scene Parsing | LaRS | mIoU | 94.2 | CSANet (ResNet-101) |
| Scene Parsing | LaRS | μ | 63.7 | CSANet (ResNet-101) |
| Semantic Segmentation | LaRS | F1 | 73.4 | KNet (Swin-T) |
| Semantic Segmentation | LaRS | Q | 71.3 | KNet (Swin-T) |
| Semantic Segmentation | LaRS | mIoU | 97.2 | KNet (Swin-T) |
| Semantic Segmentation | LaRS | μ | 78.8 | KNet (Swin-T) |
| Semantic Segmentation | LaRS | F1 | 70 | SegFormer (MiT-B2) |
| Semantic Segmentation | LaRS | Q | 67.8 | SegFormer (MiT-B2) |
| Semantic Segmentation | LaRS | mIoU | 96.8 | SegFormer (MiT-B2) |
| Semantic Segmentation | LaRS | μ | 78.6 | SegFormer (MiT-B2) |
| Semantic Segmentation | LaRS | F1 | 66.1 | DeepLabv3 (ResNet-101) |
| Semantic Segmentation | LaRS | Q | 62.9 | DeepLabv3 (ResNet-101) |
| Semantic Segmentation | LaRS | mIoU | 95.2 | DeepLabv3 (ResNet-101) |
| Semantic Segmentation | LaRS | μ | 77.5 | DeepLabv3 (ResNet-101) |
| Semantic Segmentation | LaRS | F1 | 65.4 | PointRend |
| Semantic Segmentation | LaRS | Q | 62.1 | PointRend |
| Semantic Segmentation | LaRS | mIoU | 94.9 | PointRend |
| Semantic Segmentation | LaRS | μ | 77.5 | PointRend |
| Semantic Segmentation | LaRS | F1 | 64 | DeepLabv3+ (ResNet-101) |
| Semantic Segmentation | LaRS | Q | 61 | DeepLabv3+ (ResNet-101) |
| Semantic Segmentation | LaRS | mIoU | 95.4 | DeepLabv3+ (ResNet-101) |
| Semantic Segmentation | LaRS | μ | 77.8 | DeepLabv3+ (ResNet-101) |
| Semantic Segmentation | LaRS | F1 | 64.3 | STDC2 |
| Semantic Segmentation | LaRS | Q | 60.8 | STDC2 |
| Semantic Segmentation | LaRS | mIoU | 94.5 | STDC2 |
| Semantic Segmentation | LaRS | μ | 76.5 | STDC2 |
| Semantic Segmentation | LaRS | F1 | 63.4 | FCN (ResNet-101) |
| Semantic Segmentation | LaRS | Q | 60.2 | FCN (ResNet-101) |
| Semantic Segmentation | LaRS | mIoU | 95 | FCN (ResNet-101) |
| Semantic Segmentation | LaRS | μ | 77.4 | FCN (ResNet-101) |
| Semantic Segmentation | LaRS | F1 | 61.6 | WaSR (ResNet-101) |
| Semantic Segmentation | LaRS | Q | 59.5 | WaSR (ResNet-101) |
| Semantic Segmentation | LaRS | mIoU | 96.6 | WaSR (ResNet-101) |
| Semantic Segmentation | LaRS | μ | 71 | WaSR (ResNet-101) |
| Semantic Segmentation | LaRS | F1 | 61.8 | STDC1 |
| Semantic Segmentation | LaRS | Q | 57.8 | STDC1 |
| Semantic Segmentation | LaRS | mIoU | 93.6 | STDC1 |
| Semantic Segmentation | LaRS | μ | 75.6 | STDC1 |
| Semantic Segmentation | LaRS | F1 | 57.9 | FCN (ResNet-50) |
| Semantic Segmentation | LaRS | Q | 53.6 | FCN (ResNet-50) |
| Semantic Segmentation | LaRS | mIoU | 92.6 | FCN (ResNet-50) |
| Semantic Segmentation | LaRS | μ | 76.8 | FCN (ResNet-50) |
| Semantic Segmentation | LaRS | F1 | 55.2 | Segmenter (ViT-B) |
| Semantic Segmentation | LaRS | Q | 52.6 | Segmenter (ViT-B) |
| Semantic Segmentation | LaRS | mIoU | 95.1 | Segmenter (ViT-B) |
| Semantic Segmentation | LaRS | μ | 72.2 | Segmenter (ViT-B) |
| Semantic Segmentation | LaRS | F1 | 54.7 | BiSeNetv2 |
| Semantic Segmentation | LaRS | Q | 51.2 | BiSeNetv2 |
| Semantic Segmentation | LaRS | mIoU | 93.5 | BiSeNetv2 |
| Semantic Segmentation | LaRS | μ | 73.9 | BiSeNetv2 |
| Semantic Segmentation | LaRS | F1 | 47.5 | WODIS (ResNet-101) |
| Semantic Segmentation | LaRS | Q | 40.7 | WODIS (ResNet-101) |
| Semantic Segmentation | LaRS | mIoU | 85.7 | WODIS (ResNet-101) |
| Semantic Segmentation | LaRS | μ | 63 | WODIS (ResNet-101) |
| Semantic Segmentation | LaRS | F1 | 42.8 | BiSeNetv1 (ResNet-50) |
| Semantic Segmentation | LaRS | Q | 39.4 | BiSeNetv1 (ResNet-50) |
| Semantic Segmentation | LaRS | mIoU | 92.2 | BiSeNetv1 (ResNet-50) |
| Semantic Segmentation | LaRS | μ | 73.3 | BiSeNetv1 (ResNet-50) |
| Semantic Segmentation | LaRS | F1 | 44.9 | IntCatchAI |
| Semantic Segmentation | LaRS | Q | 20.5 | IntCatchAI |
| Semantic Segmentation | LaRS | mIoU | 45.6 | IntCatchAI |
| Semantic Segmentation | LaRS | μ | 62.4 | IntCatchAI |
| Semantic Segmentation | LaRS | F1 | 15.4 | UNet |
| Semantic Segmentation | LaRS | Q | 13.9 | UNet |
| Semantic Segmentation | LaRS | mIoU | 90.1 | UNet |
| Semantic Segmentation | LaRS | μ | 75.7 | UNet |
| Semantic Segmentation | LaRS | PQ | 41.7 | Mask2Former (Swin-B) |
| Semantic Segmentation | LaRS | PQ | 40.1 | Panoptic FPN (ResNet-50) |
| Semantic Segmentation | LaRS | PQ | 39.2 | Mask2Former (Swin-T) |
| Semantic Segmentation | LaRS | PQ | 38.7 | Panoptic FPN (ResNet-101) |
| Semantic Segmentation | LaRS | PQ | 37.6 | Mask2Former (ResNet-50) |
| Semantic Segmentation | LaRS | PQ | 37.2 | Mask2Former (ResNet-101) |
| Semantic Segmentation | LaRS | PQ | 34.7 | Panoptic Deeplab (ResNet-50) |
| Semantic Segmentation | LaRS | PQ | 31.9 | MaX-DeepLab |
| Video Semantic Segmentation | LaRS | F1 | 62.1 | WaSR-T (ResNet-101) |
| Video Semantic Segmentation | LaRS | Q | 60.1 | WaSR-T (ResNet-101) |
| Video Semantic Segmentation | LaRS | mIoU | 96.7 | WaSR-T (ResNet-101) |
| Video Semantic Segmentation | LaRS | μ | 71.1 | WaSR-T (ResNet-101) |
| Video Semantic Segmentation | LaRS | F1 | 61.1 | TMANet (ResNet-50) |
| Video Semantic Segmentation | LaRS | Q | 57.5 | TMANet (ResNet-50) |
| Video Semantic Segmentation | LaRS | mIoU | 94.1 | TMANet (ResNet-50) |
| Video Semantic Segmentation | LaRS | μ | 77.1 | TMANet (ResNet-50) |
| Video Semantic Segmentation | LaRS | F1 | 52.1 | CSANet (ResNet-101) |
| Video Semantic Segmentation | LaRS | Q | 49.1 | CSANet (ResNet-101) |
| Video Semantic Segmentation | LaRS | mIoU | 94.2 | CSANet (ResNet-101) |
| Video Semantic Segmentation | LaRS | μ | 63.7 | CSANet (ResNet-101) |
| Scene Understanding | LaRS | F1 | 62.1 | WaSR-T (ResNet-101) |
| Scene Understanding | LaRS | Q | 60.1 | WaSR-T (ResNet-101) |
| Scene Understanding | LaRS | mIoU | 96.7 | WaSR-T (ResNet-101) |
| Scene Understanding | LaRS | μ | 71.1 | WaSR-T (ResNet-101) |
| Scene Understanding | LaRS | F1 | 61.1 | TMANet (ResNet-50) |
| Scene Understanding | LaRS | Q | 57.5 | TMANet (ResNet-50) |
| Scene Understanding | LaRS | mIoU | 94.1 | TMANet (ResNet-50) |
| Scene Understanding | LaRS | μ | 77.1 | TMANet (ResNet-50) |
| Scene Understanding | LaRS | F1 | 52.1 | CSANet (ResNet-101) |
| Scene Understanding | LaRS | Q | 49.1 | CSANet (ResNet-101) |
| Scene Understanding | LaRS | mIoU | 94.2 | CSANet (ResNet-101) |
| Scene Understanding | LaRS | μ | 63.7 | CSANet (ResNet-101) |
| 2D Semantic Segmentation | LaRS | F1 | 62.1 | WaSR-T (ResNet-101) |
| 2D Semantic Segmentation | LaRS | Q | 60.1 | WaSR-T (ResNet-101) |
| 2D Semantic Segmentation | LaRS | mIoU | 96.7 | WaSR-T (ResNet-101) |
| 2D Semantic Segmentation | LaRS | μ | 71.1 | WaSR-T (ResNet-101) |
| 2D Semantic Segmentation | LaRS | F1 | 61.1 | TMANet (ResNet-50) |
| 2D Semantic Segmentation | LaRS | Q | 57.5 | TMANet (ResNet-50) |
| 2D Semantic Segmentation | LaRS | mIoU | 94.1 | TMANet (ResNet-50) |
| 2D Semantic Segmentation | LaRS | μ | 77.1 | TMANet (ResNet-50) |
| 2D Semantic Segmentation | LaRS | F1 | 52.1 | CSANet (ResNet-101) |
| 2D Semantic Segmentation | LaRS | Q | 49.1 | CSANet (ResNet-101) |
| 2D Semantic Segmentation | LaRS | mIoU | 94.2 | CSANet (ResNet-101) |
| 2D Semantic Segmentation | LaRS | μ | 63.7 | CSANet (ResNet-101) |
| 10-shot image generation | LaRS | F1 | 73.4 | KNet (Swin-T) |
| 10-shot image generation | LaRS | Q | 71.3 | KNet (Swin-T) |
| 10-shot image generation | LaRS | mIoU | 97.2 | KNet (Swin-T) |
| 10-shot image generation | LaRS | μ | 78.8 | KNet (Swin-T) |
| 10-shot image generation | LaRS | F1 | 70 | SegFormer (MiT-B2) |
| 10-shot image generation | LaRS | Q | 67.8 | SegFormer (MiT-B2) |
| 10-shot image generation | LaRS | mIoU | 96.8 | SegFormer (MiT-B2) |
| 10-shot image generation | LaRS | μ | 78.6 | SegFormer (MiT-B2) |
| 10-shot image generation | LaRS | F1 | 66.1 | DeepLabv3 (ResNet-101) |
| 10-shot image generation | LaRS | Q | 62.9 | DeepLabv3 (ResNet-101) |
| 10-shot image generation | LaRS | mIoU | 95.2 | DeepLabv3 (ResNet-101) |
| 10-shot image generation | LaRS | μ | 77.5 | DeepLabv3 (ResNet-101) |
| 10-shot image generation | LaRS | F1 | 65.4 | PointRend |
| 10-shot image generation | LaRS | Q | 62.1 | PointRend |
| 10-shot image generation | LaRS | mIoU | 94.9 | PointRend |
| 10-shot image generation | LaRS | μ | 77.5 | PointRend |
| 10-shot image generation | LaRS | F1 | 64 | DeepLabv3+ (ResNet-101) |
| 10-shot image generation | LaRS | Q | 61 | DeepLabv3+ (ResNet-101) |
| 10-shot image generation | LaRS | mIoU | 95.4 | DeepLabv3+ (ResNet-101) |
| 10-shot image generation | LaRS | μ | 77.8 | DeepLabv3+ (ResNet-101) |
| 10-shot image generation | LaRS | F1 | 64.3 | STDC2 |
| 10-shot image generation | LaRS | Q | 60.8 | STDC2 |
| 10-shot image generation | LaRS | mIoU | 94.5 | STDC2 |
| 10-shot image generation | LaRS | μ | 76.5 | STDC2 |
| 10-shot image generation | LaRS | F1 | 63.4 | FCN (ResNet-101) |
| 10-shot image generation | LaRS | Q | 60.2 | FCN (ResNet-101) |
| 10-shot image generation | LaRS | mIoU | 95 | FCN (ResNet-101) |
| 10-shot image generation | LaRS | μ | 77.4 | FCN (ResNet-101) |
| 10-shot image generation | LaRS | F1 | 61.6 | WaSR (ResNet-101) |
| 10-shot image generation | LaRS | Q | 59.5 | WaSR (ResNet-101) |
| 10-shot image generation | LaRS | mIoU | 96.6 | WaSR (ResNet-101) |
| 10-shot image generation | LaRS | μ | 71 | WaSR (ResNet-101) |
| 10-shot image generation | LaRS | F1 | 61.8 | STDC1 |
| 10-shot image generation | LaRS | Q | 57.8 | STDC1 |
| 10-shot image generation | LaRS | mIoU | 93.6 | STDC1 |
| 10-shot image generation | LaRS | μ | 75.6 | STDC1 |
| 10-shot image generation | LaRS | F1 | 57.9 | FCN (ResNet-50) |
| 10-shot image generation | LaRS | Q | 53.6 | FCN (ResNet-50) |
| 10-shot image generation | LaRS | mIoU | 92.6 | FCN (ResNet-50) |
| 10-shot image generation | LaRS | μ | 76.8 | FCN (ResNet-50) |
| 10-shot image generation | LaRS | F1 | 55.2 | Segmenter (ViT-B) |
| 10-shot image generation | LaRS | Q | 52.6 | Segmenter (ViT-B) |
| 10-shot image generation | LaRS | mIoU | 95.1 | Segmenter (ViT-B) |
| 10-shot image generation | LaRS | μ | 72.2 | Segmenter (ViT-B) |
| 10-shot image generation | LaRS | F1 | 54.7 | BiSeNetv2 |
| 10-shot image generation | LaRS | Q | 51.2 | BiSeNetv2 |
| 10-shot image generation | LaRS | mIoU | 93.5 | BiSeNetv2 |
| 10-shot image generation | LaRS | μ | 73.9 | BiSeNetv2 |
| 10-shot image generation | LaRS | F1 | 47.5 | WODIS (ResNet-101) |
| 10-shot image generation | LaRS | Q | 40.7 | WODIS (ResNet-101) |
| 10-shot image generation | LaRS | mIoU | 85.7 | WODIS (ResNet-101) |
| 10-shot image generation | LaRS | μ | 63 | WODIS (ResNet-101) |
| 10-shot image generation | LaRS | F1 | 42.8 | BiSeNetv1 (ResNet-50) |
| 10-shot image generation | LaRS | Q | 39.4 | BiSeNetv1 (ResNet-50) |
| 10-shot image generation | LaRS | mIoU | 92.2 | BiSeNetv1 (ResNet-50) |
| 10-shot image generation | LaRS | μ | 73.3 | BiSeNetv1 (ResNet-50) |
| 10-shot image generation | LaRS | F1 | 44.9 | IntCatchAI |
| 10-shot image generation | LaRS | Q | 20.5 | IntCatchAI |
| 10-shot image generation | LaRS | mIoU | 45.6 | IntCatchAI |
| 10-shot image generation | LaRS | μ | 62.4 | IntCatchAI |
| 10-shot image generation | LaRS | F1 | 15.4 | UNet |
| 10-shot image generation | LaRS | Q | 13.9 | UNet |
| 10-shot image generation | LaRS | mIoU | 90.1 | UNet |
| 10-shot image generation | LaRS | μ | 75.7 | UNet |
| 10-shot image generation | LaRS | PQ | 41.7 | Mask2Former (Swin-B) |
| 10-shot image generation | LaRS | PQ | 40.1 | Panoptic FPN (ResNet-50) |
| 10-shot image generation | LaRS | PQ | 39.2 | Mask2Former (Swin-T) |
| 10-shot image generation | LaRS | PQ | 38.7 | Panoptic FPN (ResNet-101) |
| 10-shot image generation | LaRS | PQ | 37.6 | Mask2Former (ResNet-50) |
| 10-shot image generation | LaRS | PQ | 37.2 | Mask2Former (ResNet-101) |
| 10-shot image generation | LaRS | PQ | 34.7 | Panoptic Deeplab (ResNet-50) |
| 10-shot image generation | LaRS | PQ | 31.9 | MaX-DeepLab |
| Panoptic Segmentation | LaRS | PQ | 41.7 | Mask2Former (Swin-B) |
| Panoptic Segmentation | LaRS | PQ | 40.1 | Panoptic FPN (ResNet-50) |
| Panoptic Segmentation | LaRS | PQ | 39.2 | Mask2Former (Swin-T) |
| Panoptic Segmentation | LaRS | PQ | 38.7 | Panoptic FPN (ResNet-101) |
| Panoptic Segmentation | LaRS | PQ | 37.6 | Mask2Former (ResNet-50) |
| Panoptic Segmentation | LaRS | PQ | 37.2 | Mask2Former (ResNet-101) |
| Panoptic Segmentation | LaRS | PQ | 34.7 | Panoptic Deeplab (ResNet-50) |
| Panoptic Segmentation | LaRS | PQ | 31.9 | MaX-DeepLab |