TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/An Efficient 3D CNN for Action/Object Segmentation in Video

An Efficient 3D CNN for Action/Object Segmentation in Video

Rui Hou, Chen Chen, Rahul Sukthankar, Mubarak Shah

2019-07-21Visual Object TrackingAction SegmentationSemi-Supervised Video Object SegmentationSegmentationSemantic SegmentationVideo Object SegmentationVideo Semantic SegmentationImage Segmentation
PaperPDF

Abstract

Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years. However, video object segmentation remains a challenging task due to its high computational complexity. Most of the previous methods employ a two-stream CNN framework to handle spatial and motion features separately. In this paper, we propose an end-to-end encoder-decoder style 3D CNN to aggregate spatial and temporal information simultaneously for video object segmentation. To efficiently process video, we propose 3D separable convolution for the pyramid pooling module and decoder, which dramatically reduces the number of operations while maintaining the performance. Moreover, we also extend our framework to video action segmentation by adding an extra classifier to predict the action label for actors in videos. Extensive experiments on several video datasets demonstrate the superior performance of the proposed approach for action and object segmentation compared to the state-of-the-art.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2016F-measure (Decay)4.9Hou et al.
VideoDAVIS 2016F-measure (Mean)77.2Hou et al.
VideoDAVIS 2016F-measure (Recall)84.7Hou et al.
VideoDAVIS 2016J&F77.75Hou et al.
VideoDAVIS 2016Jaccard (Decay)2.3Hou et al.
VideoDAVIS 2016Jaccard (Mean)78.3Hou et al.
VideoDAVIS 2016Jaccard (Recall)91.1Hou et al.
Video Object SegmentationDAVIS 2016F-measure (Decay)4.9Hou et al.
Video Object SegmentationDAVIS 2016F-measure (Mean)77.2Hou et al.
Video Object SegmentationDAVIS 2016F-measure (Recall)84.7Hou et al.
Video Object SegmentationDAVIS 2016J&F77.75Hou et al.
Video Object SegmentationDAVIS 2016Jaccard (Decay)2.3Hou et al.
Video Object SegmentationDAVIS 2016Jaccard (Mean)78.3Hou et al.
Video Object SegmentationDAVIS 2016Jaccard (Recall)91.1Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Decay)4.9Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)77.2Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Recall)84.7Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016J&F77.75Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Decay)2.3Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)78.3Hou et al.
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Recall)91.1Hou et al.

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17