TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Actor and Action Video Segmentation from a Sentence

Actor and Action Video Segmentation from a Sentence

Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek

2018-03-20CVPR 2018 6Action SegmentationReferring Expression SegmentationSegmentationVideo SegmentationVideo Semantic Segmentation
PaperPDFCode

Abstract

This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment pairs that are outside of the actor and action vocabulary. We propose a fully-convolutional model for pixel-level actor and action segmentation using an encoder-decoder architecture optimized for video. To show the potential of actor and action video segmentation from a sentence, we extend two popular actor and action datasets with more than 7,500 natural language descriptions. Experiments demonstrate the quality of the sentence-guided segmentations, the generalization ability of our model, and its advantage for traditional actor and action segmentation compared to the state-of-the-art.

Results

TaskDatasetMetricValueModel
Instance SegmentationA2D SentencesAP0.215Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesIoU mean0.426Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesIoU overall0.551Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesPrecision@0.50.5Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesPrecision@0.60.376Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesPrecision@0.70.231Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesPrecision@0.80.094Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesPrecision@0.90.004Gavriluyk el al. (Optical flow)
Instance SegmentationA2D SentencesAP0.198Gavriluyk el al.
Instance SegmentationA2D SentencesIoU mean0.421Gavriluyk el al.
Instance SegmentationA2D SentencesIoU overall0.536Gavriluyk el al.
Instance SegmentationA2D SentencesPrecision@0.50.475Gavriluyk el al.
Instance SegmentationA2D SentencesPrecision@0.60.347Gavriluyk el al.
Instance SegmentationA2D SentencesPrecision@0.70.211Gavriluyk el al.
Instance SegmentationA2D SentencesPrecision@0.80.08Gavriluyk el al.
Instance SegmentationA2D SentencesPrecision@0.90.002Gavriluyk el al.
Instance SegmentationJ-HMDBAP0.267Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBIoU mean0.57Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBIoU overall0.555Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBPrecision@0.50.712Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBPrecision@0.60.518Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBPrecision@0.70.264Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBPrecision@0.80.03Gavrilyuk et al. (Optical flow)
Instance SegmentationJ-HMDBAP0.233Gavrilyuk et al.
Instance SegmentationJ-HMDBIoU mean0.542Gavrilyuk et al.
Instance SegmentationJ-HMDBIoU overall0.541Gavrilyuk et al.
Instance SegmentationJ-HMDBPrecision@0.50.699Gavrilyuk et al.
Instance SegmentationJ-HMDBPrecision@0.60.46Gavrilyuk et al.
Instance SegmentationJ-HMDBPrecision@0.70.173Gavrilyuk et al.
Instance SegmentationJ-HMDBPrecision@0.80.014Gavrilyuk et al.
Referring Expression SegmentationA2D SentencesAP0.215Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesIoU mean0.426Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesIoU overall0.551Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesPrecision@0.50.5Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesPrecision@0.60.376Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesPrecision@0.70.231Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesPrecision@0.80.094Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesPrecision@0.90.004Gavriluyk el al. (Optical flow)
Referring Expression SegmentationA2D SentencesAP0.198Gavriluyk el al.
Referring Expression SegmentationA2D SentencesIoU mean0.421Gavriluyk el al.
Referring Expression SegmentationA2D SentencesIoU overall0.536Gavriluyk el al.
Referring Expression SegmentationA2D SentencesPrecision@0.50.475Gavriluyk el al.
Referring Expression SegmentationA2D SentencesPrecision@0.60.347Gavriluyk el al.
Referring Expression SegmentationA2D SentencesPrecision@0.70.211Gavriluyk el al.
Referring Expression SegmentationA2D SentencesPrecision@0.80.08Gavriluyk el al.
Referring Expression SegmentationA2D SentencesPrecision@0.90.002Gavriluyk el al.
Referring Expression SegmentationJ-HMDBAP0.267Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBIoU mean0.57Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBIoU overall0.555Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBPrecision@0.50.712Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBPrecision@0.60.518Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBPrecision@0.70.264Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBPrecision@0.80.03Gavrilyuk et al. (Optical flow)
Referring Expression SegmentationJ-HMDBAP0.233Gavrilyuk et al.
Referring Expression SegmentationJ-HMDBIoU mean0.542Gavrilyuk et al.
Referring Expression SegmentationJ-HMDBIoU overall0.541Gavrilyuk et al.
Referring Expression SegmentationJ-HMDBPrecision@0.50.699Gavrilyuk et al.
Referring Expression SegmentationJ-HMDBPrecision@0.60.46Gavrilyuk et al.
Referring Expression SegmentationJ-HMDBPrecision@0.70.173Gavrilyuk et al.
Referring Expression SegmentationJ-HMDBPrecision@0.80.014Gavrilyuk et al.

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17