TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MaskConver: Revisiting Pure Convolution Model for Panoptic...

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

Abdullah Rashwan, Jiageng Zhang, Ali Taalimi, Fan Yang, Xingyi Zhou, Chaochao Yan, Liang-Chieh Chen, Yeqing Li

2023-12-11Panoptic Segmentation
PaperPDFCode(official)

Abstract

In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pure convolution model and propose a novel panoptic architecture named MaskConver. MaskConver proposes to fully unify things and stuff representation by predicting their centers. To that extent, it creates a lightweight class embedding module that can break the ties when multiple centers co-exist in the same location. Furthermore, our study shows that the decoder design is critical in ensuring that the model has sufficient context for accurate detection and segmentation. We introduce a powerful ConvNeXt-UNet decoder that closes the performance gap between convolution- and transformerbased models. With ResNet50 backbone, our MaskConver achieves 53.6% PQ on the COCO panoptic val set, outperforming the modern convolution-based model, Panoptic FCN, by 9.3% as well as transformer-based models such as Mask2Former (+1.7% PQ) and kMaX-DeepLab (+0.6% PQ). Additionally, MaskConver with a MobileNet backbone reaches 37.2% PQ, improving over Panoptic-DeepLab by +6.4% under the same FLOPs/latency constraints. A further optimized version of MaskConver achieves 29.7% PQ, while running in real-time on mobile devices. The code and model weights will be publicly available

Results

TaskDatasetMetricValueModel
Semantic SegmentationCOCO test-devPQ53.6MaskConver (ResNet50, single-scale)
Semantic SegmentationCOCO test-devPQst58.9MaskConver (ResNet50, single-scale)
Semantic SegmentationCOCO test-devPQth45.6MaskConver (ResNet50, single-scale)
10-shot image generationCOCO test-devPQ53.6MaskConver (ResNet50, single-scale)
10-shot image generationCOCO test-devPQst58.9MaskConver (ResNet50, single-scale)
10-shot image generationCOCO test-devPQth45.6MaskConver (ResNet50, single-scale)
Panoptic SegmentationCOCO test-devPQ53.6MaskConver (ResNet50, single-scale)
Panoptic SegmentationCOCO test-devPQst58.9MaskConver (ResNet50, single-scale)
Panoptic SegmentationCOCO test-devPQth45.6MaskConver (ResNet50, single-scale)

Related Papers

DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts2025-07-07HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation2025-06-26PanSt3R: Multi-view Consistent Panoptic Segmentation2025-06-26Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning2025-06-16A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects2025-06-16The Missing Point in Vision Transformers for Universal Image Segmentation2025-05-26How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation2025-05-25