TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Transforming Static Images Using Generative Models for Vid...

Transforming Static Images Using Generative Models for Video Salient Object Detection

Suhwan Cho, Minhyeok Lee, Jungho Lee, Sangyoun Lee

2024-11-21Video Salient Object DetectionTransfer LearningSalient Object Detectionobject-detectionObject Detection
PaperPDF

Abstract

In many video processing tasks, leveraging large-scale image datasets is a common strategy, as image data is more abundant and facilitates comprehensive knowledge transfer. A typical approach for simulating video from static images involves applying spatial transformations, such as affine transformations and spline warping, to create sequences that mimic temporal progression. However, in tasks like video salient object detection, where both appearance and motion cues are critical, these basic image-to-video techniques fail to produce realistic optical flows that capture the independent motion properties of each object. In this study, we show that image-to-video diffusion models can generate realistic transformations of static images while understanding the contextual relationships between image components. This ability allows the model to generate plausible optical flows, preserving semantic integrity while reflecting the independent motion of scene elements. By augmenting individual images in this way, we create large-scale image-flow pairs that significantly enhance model training. Our approach achieves state-of-the-art performance across all public benchmark datasets, outperforming existing approaches.

Results

TaskDatasetMetricValueModel
VideoDAVSOD-easy35Average MAE0.066RealFlow
VideoDAVSOD-easy35S-Measure0.803RealFlow
VideoDAVSOD-easy35max F-Measure0.732RealFlow
VideoFBMS-59AVERAGE MAE0.028RealFlow
VideoFBMS-59MAX F-MEASURE0.906RealFlow
VideoFBMS-59S-Measure0.926RealFlow
VideoDAVIS-2016AVERAGE MAE0.01RealFlow
VideoDAVIS-2016MAX F-MEASURE0.939RealFlow
VideoDAVIS-2016S-Measure0.945RealFlow
VideoViSalAverage MAE0.01RealFlow
VideoViSalS-Measure0.962RealFlow
VideoViSalmax E-measure0.966RealFlow
Object DetectionDAVSOD-easy35Average MAE0.066RealFlow
Object DetectionDAVSOD-easy35S-Measure0.803RealFlow
Object DetectionDAVSOD-easy35max F-Measure0.732RealFlow
Object DetectionFBMS-59AVERAGE MAE0.028RealFlow
Object DetectionFBMS-59MAX F-MEASURE0.906RealFlow
Object DetectionFBMS-59S-Measure0.926RealFlow
Object DetectionDAVIS-2016AVERAGE MAE0.01RealFlow
Object DetectionDAVIS-2016MAX F-MEASURE0.939RealFlow
Object DetectionDAVIS-2016S-Measure0.945RealFlow
Object DetectionViSalAverage MAE0.01RealFlow
Object DetectionViSalS-Measure0.962RealFlow
Object DetectionViSalmax E-measure0.966RealFlow
3DDAVSOD-easy35Average MAE0.066RealFlow
3DDAVSOD-easy35S-Measure0.803RealFlow
3DDAVSOD-easy35max F-Measure0.732RealFlow
3DFBMS-59AVERAGE MAE0.028RealFlow
3DFBMS-59MAX F-MEASURE0.906RealFlow
3DFBMS-59S-Measure0.926RealFlow
3DDAVIS-2016AVERAGE MAE0.01RealFlow
3DDAVIS-2016MAX F-MEASURE0.939RealFlow
3DDAVIS-2016S-Measure0.945RealFlow
3DViSalAverage MAE0.01RealFlow
3DViSalS-Measure0.962RealFlow
3DViSalmax E-measure0.966RealFlow
Video Object SegmentationDAVSOD-easy35Average MAE0.066RealFlow
Video Object SegmentationDAVSOD-easy35S-Measure0.803RealFlow
Video Object SegmentationDAVSOD-easy35max F-Measure0.732RealFlow
Video Object SegmentationFBMS-59AVERAGE MAE0.028RealFlow
Video Object SegmentationFBMS-59MAX F-MEASURE0.906RealFlow
Video Object SegmentationFBMS-59S-Measure0.926RealFlow
Video Object SegmentationDAVIS-2016AVERAGE MAE0.01RealFlow
Video Object SegmentationDAVIS-2016MAX F-MEASURE0.939RealFlow
Video Object SegmentationDAVIS-2016S-Measure0.945RealFlow
Video Object SegmentationViSalAverage MAE0.01RealFlow
Video Object SegmentationViSalS-Measure0.962RealFlow
Video Object SegmentationViSalmax E-measure0.966RealFlow
RGB Salient Object DetectionDAVSOD-easy35Average MAE0.066RealFlow
RGB Salient Object DetectionDAVSOD-easy35S-Measure0.803RealFlow
RGB Salient Object DetectionDAVSOD-easy35max F-Measure0.732RealFlow
RGB Salient Object DetectionFBMS-59AVERAGE MAE0.028RealFlow
RGB Salient Object DetectionFBMS-59MAX F-MEASURE0.906RealFlow
RGB Salient Object DetectionFBMS-59S-Measure0.926RealFlow
RGB Salient Object DetectionDAVIS-2016AVERAGE MAE0.01RealFlow
RGB Salient Object DetectionDAVIS-2016MAX F-MEASURE0.939RealFlow
RGB Salient Object DetectionDAVIS-2016S-Measure0.945RealFlow
RGB Salient Object DetectionViSalAverage MAE0.01RealFlow
RGB Salient Object DetectionViSalS-Measure0.962RealFlow
RGB Salient Object DetectionViSalmax E-measure0.966RealFlow
2D ClassificationDAVSOD-easy35Average MAE0.066RealFlow
2D ClassificationDAVSOD-easy35S-Measure0.803RealFlow
2D ClassificationDAVSOD-easy35max F-Measure0.732RealFlow
2D ClassificationFBMS-59AVERAGE MAE0.028RealFlow
2D ClassificationFBMS-59MAX F-MEASURE0.906RealFlow
2D ClassificationFBMS-59S-Measure0.926RealFlow
2D ClassificationDAVIS-2016AVERAGE MAE0.01RealFlow
2D ClassificationDAVIS-2016MAX F-MEASURE0.939RealFlow
2D ClassificationDAVIS-2016S-Measure0.945RealFlow
2D ClassificationViSalAverage MAE0.01RealFlow
2D ClassificationViSalS-Measure0.962RealFlow
2D ClassificationViSalmax E-measure0.966RealFlow
2D Object DetectionDAVSOD-easy35Average MAE0.066RealFlow
2D Object DetectionDAVSOD-easy35S-Measure0.803RealFlow
2D Object DetectionDAVSOD-easy35max F-Measure0.732RealFlow
2D Object DetectionFBMS-59AVERAGE MAE0.028RealFlow
2D Object DetectionFBMS-59MAX F-MEASURE0.906RealFlow
2D Object DetectionFBMS-59S-Measure0.926RealFlow
2D Object DetectionDAVIS-2016AVERAGE MAE0.01RealFlow
2D Object DetectionDAVIS-2016MAX F-MEASURE0.939RealFlow
2D Object DetectionDAVIS-2016S-Measure0.945RealFlow
2D Object DetectionViSalAverage MAE0.01RealFlow
2D Object DetectionViSalS-Measure0.962RealFlow
2D Object DetectionViSalmax E-measure0.966RealFlow
16kDAVSOD-easy35Average MAE0.066RealFlow
16kDAVSOD-easy35S-Measure0.803RealFlow
16kDAVSOD-easy35max F-Measure0.732RealFlow
16kFBMS-59AVERAGE MAE0.028RealFlow
16kFBMS-59MAX F-MEASURE0.906RealFlow
16kFBMS-59S-Measure0.926RealFlow
16kDAVIS-2016AVERAGE MAE0.01RealFlow
16kDAVIS-2016MAX F-MEASURE0.939RealFlow
16kDAVIS-2016S-Measure0.945RealFlow
16kViSalAverage MAE0.01RealFlow
16kViSalS-Measure0.962RealFlow
16kViSalmax E-measure0.966RealFlow

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16