TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SimpleClick: Interactive Image Segmentation with Simple Vi...

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Qin Liu, Zhenlin Xu, Gedas Bertasius, Marc Niethammer

2022-10-20ICCV 2023 1Interactive SegmentationSegmentationSemantic SegmentationImage Segmentation
PaperPDFCode(official)Code

Abstract

Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the de-facto architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves 4.15 NoC@90 on SBD, improving 21.8% over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We further develop an extremely tiny ViT backbone for SimpleClick and provide a detailed computational analysis, highlighting its suitability as a practical annotation tool.

Results

TaskDatasetMetricValueModel
Interactive SegmentationGrabCutNoC@851.32SimpleClick (ViT-L, C+L)
Interactive SegmentationGrabCutNoC@901.4SimpleClick (ViT-L, C+L)
Interactive SegmentationGrabCutNoC@851.32SimpleClick (ViT-H, SBD)
Interactive SegmentationGrabCutNoC@901.44SimpleClick (ViT-H, SBD)
Interactive SegmentationBerkeleyNoC@901.75SimpleClick (ViT-H, C+L)
Interactive SegmentationBerkeleyNoC@902.09SimpleClick (ViT-H, SBD)
Interactive SegmentationDAVISNoC@853.41SimpleClick (ViT-H, C+L)
Interactive SegmentationDAVISNoC@904.7SimpleClick (ViT-H, C+L)
Interactive SegmentationDAVISNoC@854.2SimpleClick (ViT-H, SBD)
Interactive SegmentationDAVISNoC@905.34SimpleClick (ViT-H, SBD)
Interactive SegmentationSBDNoC@852.51SimpleClick
Interactive SegmentationSBDNoC@904.15SimpleClick

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17