Transforming Static Images Using Generative Models for Video Salient Object Detection

Suhwan Cho, Minhyeok Lee, Jungho Lee, Sangyoun Lee

2024-11-21Video Salient Object Detection Transfer Learning Salient Object Detection object-detection Object Detection

Abstract

In many video processing tasks, leveraging large-scale image datasets is a common strategy, as image data is more abundant and facilitates comprehensive knowledge transfer. A typical approach for simulating video from static images involves applying spatial transformations, such as affine transformations and spline warping, to create sequences that mimic temporal progression. However, in tasks like video salient object detection, where both appearance and motion cues are critical, these basic image-to-video techniques fail to produce realistic optical flows that capture the independent motion properties of each object. In this study, we show that image-to-video diffusion models can generate realistic transformations of static images while understanding the contextual relationships between image components. This ability allows the model to generate plausible optical flows, preserving semantic integrity while reflecting the independent motion of scene elements. By augmenting individual images in this way, we create large-scale image-flow pairs that significantly enhance model training. Our approach achieves state-of-the-art performance across all public benchmark datasets, outperforming existing approaches.

Results

Task	Dataset	Metric	Value	Model
Video	DAVSOD-easy35	Average MAE	0.066	RealFlow
Video	DAVSOD-easy35	S-Measure	0.803	RealFlow
Video	DAVSOD-easy35	max F-Measure	0.732	RealFlow
Video	FBMS-59	AVERAGE MAE	0.028	RealFlow
Video	FBMS-59	MAX F-MEASURE	0.906	RealFlow
Video	FBMS-59	S-Measure	0.926	RealFlow
Video	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
Video	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
Video	DAVIS-2016	S-Measure	0.945	RealFlow
Video	ViSal	Average MAE	0.01	RealFlow
Video	ViSal	S-Measure	0.962	RealFlow
Video	ViSal	max E-measure	0.966	RealFlow
Object Detection	DAVSOD-easy35	Average MAE	0.066	RealFlow
Object Detection	DAVSOD-easy35	S-Measure	0.803	RealFlow
Object Detection	DAVSOD-easy35	max F-Measure	0.732	RealFlow
Object Detection	FBMS-59	AVERAGE MAE	0.028	RealFlow
Object Detection	FBMS-59	MAX F-MEASURE	0.906	RealFlow
Object Detection	FBMS-59	S-Measure	0.926	RealFlow
Object Detection	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
Object Detection	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
Object Detection	DAVIS-2016	S-Measure	0.945	RealFlow
Object Detection	ViSal	Average MAE	0.01	RealFlow
Object Detection	ViSal	S-Measure	0.962	RealFlow
Object Detection	ViSal	max E-measure	0.966	RealFlow
3D	DAVSOD-easy35	Average MAE	0.066	RealFlow
3D	DAVSOD-easy35	S-Measure	0.803	RealFlow
3D	DAVSOD-easy35	max F-Measure	0.732	RealFlow
3D	FBMS-59	AVERAGE MAE	0.028	RealFlow
3D	FBMS-59	MAX F-MEASURE	0.906	RealFlow
3D	FBMS-59	S-Measure	0.926	RealFlow
3D	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
3D	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
3D	DAVIS-2016	S-Measure	0.945	RealFlow
3D	ViSal	Average MAE	0.01	RealFlow
3D	ViSal	S-Measure	0.962	RealFlow
3D	ViSal	max E-measure	0.966	RealFlow
Video Object Segmentation	DAVSOD-easy35	Average MAE	0.066	RealFlow
Video Object Segmentation	DAVSOD-easy35	S-Measure	0.803	RealFlow
Video Object Segmentation	DAVSOD-easy35	max F-Measure	0.732	RealFlow
Video Object Segmentation	FBMS-59	AVERAGE MAE	0.028	RealFlow
Video Object Segmentation	FBMS-59	MAX F-MEASURE	0.906	RealFlow
Video Object Segmentation	FBMS-59	S-Measure	0.926	RealFlow
Video Object Segmentation	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
Video Object Segmentation	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
Video Object Segmentation	DAVIS-2016	S-Measure	0.945	RealFlow
Video Object Segmentation	ViSal	Average MAE	0.01	RealFlow
Video Object Segmentation	ViSal	S-Measure	0.962	RealFlow
Video Object Segmentation	ViSal	max E-measure	0.966	RealFlow
RGB Salient Object Detection	DAVSOD-easy35	Average MAE	0.066	RealFlow
RGB Salient Object Detection	DAVSOD-easy35	S-Measure	0.803	RealFlow
RGB Salient Object Detection	DAVSOD-easy35	max F-Measure	0.732	RealFlow
RGB Salient Object Detection	FBMS-59	AVERAGE MAE	0.028	RealFlow
RGB Salient Object Detection	FBMS-59	MAX F-MEASURE	0.906	RealFlow
RGB Salient Object Detection	FBMS-59	S-Measure	0.926	RealFlow
RGB Salient Object Detection	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
RGB Salient Object Detection	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
RGB Salient Object Detection	DAVIS-2016	S-Measure	0.945	RealFlow
RGB Salient Object Detection	ViSal	Average MAE	0.01	RealFlow
RGB Salient Object Detection	ViSal	S-Measure	0.962	RealFlow
RGB Salient Object Detection	ViSal	max E-measure	0.966	RealFlow
2D Classification	DAVSOD-easy35	Average MAE	0.066	RealFlow
2D Classification	DAVSOD-easy35	S-Measure	0.803	RealFlow
2D Classification	DAVSOD-easy35	max F-Measure	0.732	RealFlow
2D Classification	FBMS-59	AVERAGE MAE	0.028	RealFlow
2D Classification	FBMS-59	MAX F-MEASURE	0.906	RealFlow
2D Classification	FBMS-59	S-Measure	0.926	RealFlow
2D Classification	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
2D Classification	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
2D Classification	DAVIS-2016	S-Measure	0.945	RealFlow
2D Classification	ViSal	Average MAE	0.01	RealFlow
2D Classification	ViSal	S-Measure	0.962	RealFlow
2D Classification	ViSal	max E-measure	0.966	RealFlow
2D Object Detection	DAVSOD-easy35	Average MAE	0.066	RealFlow
2D Object Detection	DAVSOD-easy35	S-Measure	0.803	RealFlow
2D Object Detection	DAVSOD-easy35	max F-Measure	0.732	RealFlow
2D Object Detection	FBMS-59	AVERAGE MAE	0.028	RealFlow
2D Object Detection	FBMS-59	MAX F-MEASURE	0.906	RealFlow
2D Object Detection	FBMS-59	S-Measure	0.926	RealFlow
2D Object Detection	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
2D Object Detection	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
2D Object Detection	DAVIS-2016	S-Measure	0.945	RealFlow
2D Object Detection	ViSal	Average MAE	0.01	RealFlow
2D Object Detection	ViSal	S-Measure	0.962	RealFlow
2D Object Detection	ViSal	max E-measure	0.966	RealFlow
16k	DAVSOD-easy35	Average MAE	0.066	RealFlow
16k	DAVSOD-easy35	S-Measure	0.803	RealFlow
16k	DAVSOD-easy35	max F-Measure	0.732	RealFlow
16k	FBMS-59	AVERAGE MAE	0.028	RealFlow
16k	FBMS-59	MAX F-MEASURE	0.906	RealFlow
16k	FBMS-59	S-Measure	0.926	RealFlow
16k	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
16k	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
16k	DAVIS-2016	S-Measure	0.945	RealFlow
16k	ViSal	Average MAE	0.01	RealFlow
16k	ViSal	S-Measure	0.962	RealFlow
16k	ViSal	max E-measure	0.966	RealFlow

Abstract

Results

Task	Dataset	Metric	Value	Model
Video	DAVSOD-easy35	Average MAE	0.066	RealFlow
Video	DAVSOD-easy35	S-Measure	0.803	RealFlow
Video	DAVSOD-easy35	max F-Measure	0.732	RealFlow
Video	FBMS-59	AVERAGE MAE	0.028	RealFlow
Video	FBMS-59	MAX F-MEASURE	0.906	RealFlow
Video	FBMS-59	S-Measure	0.926	RealFlow
Video	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
Video	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
Video	DAVIS-2016	S-Measure	0.945	RealFlow
Video	ViSal	Average MAE	0.01	RealFlow
Video	ViSal	S-Measure	0.962	RealFlow
Video	ViSal	max E-measure	0.966	RealFlow
Object Detection	DAVSOD-easy35	Average MAE	0.066	RealFlow
Object Detection	DAVSOD-easy35	S-Measure	0.803	RealFlow
Object Detection	DAVSOD-easy35	max F-Measure	0.732	RealFlow
Object Detection	FBMS-59	AVERAGE MAE	0.028	RealFlow
Object Detection	FBMS-59	MAX F-MEASURE	0.906	RealFlow
Object Detection	FBMS-59	S-Measure	0.926	RealFlow
Object Detection	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
Object Detection	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
Object Detection	DAVIS-2016	S-Measure	0.945	RealFlow
Object Detection	ViSal	Average MAE	0.01	RealFlow
Object Detection	ViSal	S-Measure	0.962	RealFlow
Object Detection	ViSal	max E-measure	0.966	RealFlow
3D	DAVSOD-easy35	Average MAE	0.066	RealFlow
3D	DAVSOD-easy35	S-Measure	0.803	RealFlow
3D	DAVSOD-easy35	max F-Measure	0.732	RealFlow
3D	FBMS-59	AVERAGE MAE	0.028	RealFlow
3D	FBMS-59	MAX F-MEASURE	0.906	RealFlow
3D	FBMS-59	S-Measure	0.926	RealFlow
3D	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
3D	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
3D	DAVIS-2016	S-Measure	0.945	RealFlow
3D	ViSal	Average MAE	0.01	RealFlow
3D	ViSal	S-Measure	0.962	RealFlow
3D	ViSal	max E-measure	0.966	RealFlow
Video Object Segmentation	DAVSOD-easy35	Average MAE	0.066	RealFlow
Video Object Segmentation	DAVSOD-easy35	S-Measure	0.803	RealFlow
Video Object Segmentation	DAVSOD-easy35	max F-Measure	0.732	RealFlow
Video Object Segmentation	FBMS-59	AVERAGE MAE	0.028	RealFlow
Video Object Segmentation	FBMS-59	MAX F-MEASURE	0.906	RealFlow
Video Object Segmentation	FBMS-59	S-Measure	0.926	RealFlow
Video Object Segmentation	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
Video Object Segmentation	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
Video Object Segmentation	DAVIS-2016	S-Measure	0.945	RealFlow
Video Object Segmentation	ViSal	Average MAE	0.01	RealFlow
Video Object Segmentation	ViSal	S-Measure	0.962	RealFlow
Video Object Segmentation	ViSal	max E-measure	0.966	RealFlow
RGB Salient Object Detection	DAVSOD-easy35	Average MAE	0.066	RealFlow
RGB Salient Object Detection	DAVSOD-easy35	S-Measure	0.803	RealFlow
RGB Salient Object Detection	DAVSOD-easy35	max F-Measure	0.732	RealFlow
RGB Salient Object Detection	FBMS-59	AVERAGE MAE	0.028	RealFlow
RGB Salient Object Detection	FBMS-59	MAX F-MEASURE	0.906	RealFlow
RGB Salient Object Detection	FBMS-59	S-Measure	0.926	RealFlow
RGB Salient Object Detection	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
RGB Salient Object Detection	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
RGB Salient Object Detection	DAVIS-2016	S-Measure	0.945	RealFlow
RGB Salient Object Detection	ViSal	Average MAE	0.01	RealFlow
RGB Salient Object Detection	ViSal	S-Measure	0.962	RealFlow
RGB Salient Object Detection	ViSal	max E-measure	0.966	RealFlow
2D Classification	DAVSOD-easy35	Average MAE	0.066	RealFlow
2D Classification	DAVSOD-easy35	S-Measure	0.803	RealFlow
2D Classification	DAVSOD-easy35	max F-Measure	0.732	RealFlow
2D Classification	FBMS-59	AVERAGE MAE	0.028	RealFlow
2D Classification	FBMS-59	MAX F-MEASURE	0.906	RealFlow
2D Classification	FBMS-59	S-Measure	0.926	RealFlow
2D Classification	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
2D Classification	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
2D Classification	DAVIS-2016	S-Measure	0.945	RealFlow
2D Classification	ViSal	Average MAE	0.01	RealFlow
2D Classification	ViSal	S-Measure	0.962	RealFlow
2D Classification	ViSal	max E-measure	0.966	RealFlow
2D Object Detection	DAVSOD-easy35	Average MAE	0.066	RealFlow
2D Object Detection	DAVSOD-easy35	S-Measure	0.803	RealFlow
2D Object Detection	DAVSOD-easy35	max F-Measure	0.732	RealFlow
2D Object Detection	FBMS-59	AVERAGE MAE	0.028	RealFlow
2D Object Detection	FBMS-59	MAX F-MEASURE	0.906	RealFlow
2D Object Detection	FBMS-59	S-Measure	0.926	RealFlow
2D Object Detection	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
2D Object Detection	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
2D Object Detection	DAVIS-2016	S-Measure	0.945	RealFlow
2D Object Detection	ViSal	Average MAE	0.01	RealFlow
2D Object Detection	ViSal	S-Measure	0.962	RealFlow
2D Object Detection	ViSal	max E-measure	0.966	RealFlow
16k	DAVSOD-easy35	Average MAE	0.066	RealFlow
16k	DAVSOD-easy35	S-Measure	0.803	RealFlow
16k	DAVSOD-easy35	max F-Measure	0.732	RealFlow
16k	FBMS-59	AVERAGE MAE	0.028	RealFlow
16k	FBMS-59	MAX F-MEASURE	0.906	RealFlow
16k	FBMS-59	S-Measure	0.926	RealFlow
16k	DAVIS-2016	AVERAGE MAE	0.01	RealFlow
16k	DAVIS-2016	MAX F-MEASURE	0.939	RealFlow
16k	DAVIS-2016	S-Measure	0.945	RealFlow
16k	ViSal	Average MAE	0.01	RealFlow
16k	ViSal	S-Measure	0.962	RealFlow
16k	ViSal	max E-measure	0.966	RealFlow

Transforming Static Images Using Generative Models for Video Salient Object Detection

Abstract

Results

Related Papers

Transforming Static Images Using Generative Models for Video Salient Object Detection

Abstract

Results

Related Papers