TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Tr...

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

2020-12-01CVPR 2021 1Panoptic Segmentation
PaperPDFCodeCodeCode(official)

Abstract

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set. Code is available at https://github.com/google-research/deeplab2.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCOCO test-devPQ51.3MaX-DeepLab-L (single-scale)
Semantic SegmentationCOCO test-devPQst42.4MaX-DeepLab-L (single-scale)
Semantic SegmentationCOCO test-devPQth57.2MaX-DeepLab-L (single-scale)
Semantic SegmentationCOCO minivalPQ51.1MaX-DeepLab-L (single-scale)
Semantic SegmentationCOCO minivalPQst42.2MaX-DeepLab-L (single-scale)
Semantic SegmentationCOCO minivalPQth57MaX-DeepLab-L (single-scale)
10-shot image generationCOCO test-devPQ51.3MaX-DeepLab-L (single-scale)
10-shot image generationCOCO test-devPQst42.4MaX-DeepLab-L (single-scale)
10-shot image generationCOCO test-devPQth57.2MaX-DeepLab-L (single-scale)
10-shot image generationCOCO minivalPQ51.1MaX-DeepLab-L (single-scale)
10-shot image generationCOCO minivalPQst42.2MaX-DeepLab-L (single-scale)
10-shot image generationCOCO minivalPQth57MaX-DeepLab-L (single-scale)
Panoptic SegmentationCOCO test-devPQ51.3MaX-DeepLab-L (single-scale)
Panoptic SegmentationCOCO test-devPQst42.4MaX-DeepLab-L (single-scale)
Panoptic SegmentationCOCO test-devPQth57.2MaX-DeepLab-L (single-scale)
Panoptic SegmentationCOCO minivalPQ51.1MaX-DeepLab-L (single-scale)
Panoptic SegmentationCOCO minivalPQst42.2MaX-DeepLab-L (single-scale)
Panoptic SegmentationCOCO minivalPQth57MaX-DeepLab-L (single-scale)

Related Papers

DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts2025-07-07HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation2025-06-26PanSt3R: Multi-view Consistent Panoptic Segmentation2025-06-26Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning2025-06-16A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects2025-06-16The Missing Point in Vision Transformers for Universal Image Segmentation2025-05-26How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation2025-05-25