MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder

Md Motiur Rahman, Shiva Shokouhmand, Smriti Bhatt, Miad Faezipour

2023-10-30Segmentation Semantic Segmentation Medical Image Segmentation Image Segmentation

Abstract

One of the common and promising deep learning approaches used for medical image segmentation is transformers, as they can capture long-range dependencies among the pixels by utilizing self-attention. Despite being successful in medical image segmentation, transformers face limitations in capturing local contexts of pixels in multimodal dimensions. We propose a Medical Image Segmentation Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder to address this issue. MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images. In the CAM decoder, an attention-mixer combining multi-head self-attention, spatial attention, and squeeze and excitation attention modules is introduced to capture long-range dependencies in all spatial dimensions. Moreover, to enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion, respectively. The integration of low-level and high-level features from different network stages is enabled by skip connections, allowing MIST to suppress unnecessary information. The experiments show that our MIST transformer with CAM decoder outperforms the state-of-the-art models specifically designed for medical image segmentation on the ACDC and Synapse datasets. Our results also demonstrate that adding the CAM decoder with a hierarchical transformer improves segmentation performance significantly. Our model with data and code is publicly available on GitHub.

Results

Task	Dataset	Metric	Value	Model
Medical Image Segmentation	Synapse multi-organ CT	Avg DSC	86.92	MIST
Medical Image Segmentation	Synapse multi-organ CT	Avg HD	11.07	MIST
Medical Image Segmentation	Automatic Cardiac Diagnosis Challenge (ACDC)	Avg DSC	92.56	MIST

MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder

Abstract

Results

Related Papers

MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder

Abstract

Results

Related Papers