Temporal Memory Attention for Video Semantic Segmentation

Hao Wang, Weining Wang, Jing Liu

2021-02-17Segmentation Semantic Segmentation Video Semantic Segmentation

Abstract

Video semantic segmentation requires to utilize the complex temporal relations between frames of the video sequence. Previous works usually exploit accurate optical flow to leverage the temporal relations, which suffer much from heavy computational cost. In this paper, we propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Specially, we construct a memory using several past frames to store the temporal information of the current frame. We then propose a temporal memory attention module to capture the relation between the current frame and the memory to enhance the representation of the current frame. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50.

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	Cityscapes val	mIoU	80.3	TMANet-50
Scene Parsing	CamVid	Mean IoU	76.5	TMANet-50
Scene Parsing	CamVid	Mean IoU	74.7	Netwarp
Semantic Segmentation	UrbanLF	mIoU (Real)	77.14	TMANet
Semantic Segmentation	UrbanLF	mIoU (Syn)	76.41	TMANet
Video Semantic Segmentation	Cityscapes val	mIoU	80.3	TMANet-50
Video Semantic Segmentation	CamVid	Mean IoU	76.5	TMANet-50
Video Semantic Segmentation	CamVid	Mean IoU	74.7	Netwarp
Scene Understanding	Cityscapes val	mIoU	80.3	TMANet-50
Scene Understanding	CamVid	Mean IoU	76.5	TMANet-50
Scene Understanding	CamVid	Mean IoU	74.7	Netwarp
2D Semantic Segmentation	Cityscapes val	mIoU	80.3	TMANet-50
2D Semantic Segmentation	CamVid	Mean IoU	76.5	TMANet-50
2D Semantic Segmentation	CamVid	Mean IoU	74.7	Netwarp
10-shot image generation	UrbanLF	mIoU (Real)	77.14	TMANet
10-shot image generation	UrbanLF	mIoU (Syn)	76.41	TMANet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17