Actor and Action Video Segmentation from a Sentence

Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek

2018-03-20CVPR 2018 6Action Segmentation Referring Expression Segmentation Segmentation Video Segmentation Video Semantic Segmentation

Paper PDF Code

Abstract

This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment pairs that are outside of the actor and action vocabulary. We propose a fully-convolutional model for pixel-level actor and action segmentation using an encoder-decoder architecture optimized for video. To show the potential of actor and action video segmentation from a sentence, we extend two popular actor and action datasets with more than 7,500 natural language descriptions. Experiments demonstrate the quality of the sentence-guided segmentations, the generalization ability of our model, and its advantage for traditional actor and action segmentation compared to the state-of-the-art.

Results

Task	Dataset	Metric	Value	Model
Instance Segmentation	A2D Sentences	AP	0.215	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	IoU mean	0.426	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	IoU overall	0.551	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.5	0.5	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.6	0.376	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.7	0.231	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.8	0.094	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.9	0.004	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	AP	0.198	Gavriluyk el al.
Instance Segmentation	A2D Sentences	IoU mean	0.421	Gavriluyk el al.
Instance Segmentation	A2D Sentences	IoU overall	0.536	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.5	0.475	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.6	0.347	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.7	0.211	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.8	0.08	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.9	0.002	Gavriluyk el al.
Instance Segmentation	J-HMDB	AP	0.267	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	IoU mean	0.57	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	IoU overall	0.555	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.5	0.712	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.6	0.518	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.7	0.264	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.8	0.03	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	AP	0.233	Gavrilyuk et al.
Instance Segmentation	J-HMDB	IoU mean	0.542	Gavrilyuk et al.
Instance Segmentation	J-HMDB	IoU overall	0.541	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.5	0.699	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.6	0.46	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.7	0.173	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.8	0.014	Gavrilyuk et al.
Referring Expression Segmentation	A2D Sentences	AP	0.215	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	IoU mean	0.426	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	IoU overall	0.551	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.5	0.5	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.6	0.376	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.7	0.231	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.8	0.094	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.9	0.004	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	AP	0.198	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	IoU mean	0.421	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	IoU overall	0.536	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.5	0.475	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.6	0.347	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.7	0.211	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.8	0.08	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.9	0.002	Gavriluyk el al.
Referring Expression Segmentation	J-HMDB	AP	0.267	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	IoU mean	0.57	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	IoU overall	0.555	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.5	0.712	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.6	0.518	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.7	0.264	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.8	0.03	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	AP	0.233	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	IoU mean	0.542	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	IoU overall	0.541	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.5	0.699	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.6	0.46	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.7	0.173	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.8	0.014	Gavrilyuk et al.

Abstract

Results

Task	Dataset	Metric	Value	Model
Instance Segmentation	A2D Sentences	AP	0.215	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	IoU mean	0.426	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	IoU overall	0.551	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.5	0.5	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.6	0.376	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.7	0.231	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.8	0.094	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	Precision@0.9	0.004	Gavriluyk el al. (Optical flow)
Instance Segmentation	A2D Sentences	AP	0.198	Gavriluyk el al.
Instance Segmentation	A2D Sentences	IoU mean	0.421	Gavriluyk el al.
Instance Segmentation	A2D Sentences	IoU overall	0.536	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.5	0.475	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.6	0.347	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.7	0.211	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.8	0.08	Gavriluyk el al.
Instance Segmentation	A2D Sentences	Precision@0.9	0.002	Gavriluyk el al.
Instance Segmentation	J-HMDB	AP	0.267	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	IoU mean	0.57	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	IoU overall	0.555	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.5	0.712	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.6	0.518	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.7	0.264	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	Precision@0.8	0.03	Gavrilyuk et al. (Optical flow)
Instance Segmentation	J-HMDB	AP	0.233	Gavrilyuk et al.
Instance Segmentation	J-HMDB	IoU mean	0.542	Gavrilyuk et al.
Instance Segmentation	J-HMDB	IoU overall	0.541	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.5	0.699	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.6	0.46	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.7	0.173	Gavrilyuk et al.
Instance Segmentation	J-HMDB	Precision@0.8	0.014	Gavrilyuk et al.
Referring Expression Segmentation	A2D Sentences	AP	0.215	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	IoU mean	0.426	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	IoU overall	0.551	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.5	0.5	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.6	0.376	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.7	0.231	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.8	0.094	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	Precision@0.9	0.004	Gavriluyk el al. (Optical flow)
Referring Expression Segmentation	A2D Sentences	AP	0.198	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	IoU mean	0.421	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	IoU overall	0.536	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.5	0.475	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.6	0.347	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.7	0.211	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.8	0.08	Gavriluyk el al.
Referring Expression Segmentation	A2D Sentences	Precision@0.9	0.002	Gavriluyk el al.
Referring Expression Segmentation	J-HMDB	AP	0.267	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	IoU mean	0.57	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	IoU overall	0.555	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.5	0.712	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.6	0.518	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.7	0.264	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	Precision@0.8	0.03	Gavrilyuk et al. (Optical flow)
Referring Expression Segmentation	J-HMDB	AP	0.233	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	IoU mean	0.542	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	IoU overall	0.541	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.5	0.699	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.6	0.46	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.7	0.173	Gavrilyuk et al.
Referring Expression Segmentation	J-HMDB	Precision@0.8	0.014	Gavrilyuk et al.

Actor and Action Video Segmentation from a Sentence

Abstract

Results

Related Papers

Actor and Action Video Segmentation from a Sentence

Abstract

Results

Related Papers