TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/QTSeg: A Query Token-Based Architecture for Efficient 2D M...

QTSeg: A Query Token-Based Architecture for Efficient 2D Medical Image Segmentation

Phuong-Nam Tran, Nhat Truong Pham, Duc Ngoc Minh Dang, Eui-Nam Huh, Choong Seon Hong

2024-12-23Skin Lesion SegmentationBreast Cancer DetectionSemantic SegmentationMedical Image SegmentationImage Segmentation
PaperPDFCode(official)

Abstract

Medical image segmentation is crucial in assisting medical doctors in making diagnoses and enabling accurate automatic diagnosis. While advanced convolutional neural networks (CNNs) excel in segmenting regions of interest with pixel-level precision, they often struggle with long-range dependencies, which is crucial for enhancing model performance. Conversely, transformer architectures leverage attention mechanisms to excel in handling long-range dependencies. However, the computational complexity of transformers grows quadratically, posing resource-intensive challenges, especially with high-resolution medical images. Recent research aims to combine CNN and transformer architectures to mitigate their drawbacks and enhance performance while keeping resource demands low. Nevertheless, existing approaches have not fully leveraged the strengths of both architectures to achieve high accuracy with low computational requirements. To address this gap, we propose a novel architecture for 2D medical image segmentation (QTSeg) that leverages a feature pyramid network (FPN) as the image encoder, a multi-level feature fusion (MLFF) as the adaptive module between encoder and decoder and a multi-query mask decoder (MQM Decoder) as the mask decoder. In the first step, an FPN model extracts pyramid features from the input image. Next, MLFF is incorporated between the encoder and decoder to adapt features from different encoder stages to the decoder. Finally, an MQM Decoder is employed to improve mask generation by integrating query tokens with pyramid features at all stages of the mask decoder. Our experimental results show that QTSeg outperforms state-of-the-art methods across all metrics with lower computational demands than the baseline and the existing methods. Code is available at https://github.com/tpnam0901/QTSeg (v0.1.0)

Results

TaskDatasetMetricValueModel
Medical Image SegmentationBKAI-IGH NeoPolyp-SmallAverage Dice (5-folds)93.13QTSeg
Medical Image SegmentationBKAI-IGH NeoPolyp-SmallMAE (5-folds)0.06QTSeg
Medical Image SegmentationBKAI-IGH NeoPolyp-SmallmIoU (5-folds)88.94QTSeg
Medical Image SegmentationISIC2016ACC96.41QTSeg
Medical Image SegmentationISIC2016Average IOU86.74QTSeg
Medical Image SegmentationISIC2016Dice92.42QTSeg
Medical Image SegmentationISIC2016MAE0.0359QTSeg
SkinISIC2016ACC96.41QTSeg
SkinISIC2016Average IOU86.74QTSeg
SkinISIC2016Dice92.42QTSeg
SkinISIC2016MAE0.0359QTSeg

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15