TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Elevating Flow-Guided Video Inpainting with Reference Gene...

Elevating Flow-Guided Video Inpainting with Reference Generation

Suhwan Cho, Seoung Wug Oh, Sangyoun Lee, Joon-Young Lee

2024-12-12Video Inpainting2k
PaperPDFCode(official)

Abstract

Video inpainting (VI) is a challenging task that requires effective propagation of observable content across frames while simultaneously generating new content not present in the original video. In this study, we propose a robust and practical VI framework that leverages a large generative model for reference generation in combination with an advanced pixel propagation algorithm. Powered by a strong generative model, our method not only significantly enhances frame-level quality for object removal but also synthesizes new content in the missing areas based on user-provided text prompts. For pixel propagation, we introduce a one-shot pixel pulling method that effectively avoids error accumulation from repeated sampling while maintaining sub-pixel precision. To evaluate various VI methods in realistic scenarios, we also propose a high-quality VI benchmark, HQVI, comprising carefully generated videos using alpha matte composition. On public benchmarks and the HQVI dataset, our method demonstrates significantly higher visual quality and metric scores compared to existing solutions. Furthermore, it can process high-resolution videos exceeding 2K resolution with ease, underscoring its superiority for real-world applications.

Results

TaskDatasetMetricValueModel
3DHQVI (240p)LPIPS0.0335RGVI
3DHQVI (240p)PSNR30.66RGVI
3DHQVI (240p)SSIM0.9527RGVI
3DHQVI (240p)VFID0.1825RGVI
3DHQVI (240p)LPIPS0.039RGVI w/o Ref.
3DHQVI (240p)PSNR31.6RGVI w/o Ref.
3DHQVI (240p)SSIM0.9559RGVI w/o Ref.
3DHQVI (240p)VFID0.1868RGVI w/o Ref.
3DHQVI (2K)LPIPS0.0357RGVI
3DHQVI (2K)PSNR30.1RGVI
3DHQVI (2K)SSIM0.9489RGVI
3DHQVI (2K)VFID0.0058RGVI
3DHQVI (2K)LPIPS0.0403RGVI w/o Ref.
3DHQVI (2K)PSNR29.81RGVI w/o Ref.
3DHQVI (2K)SSIM0.9501RGVI w/o Ref.
3DHQVI (2K)VFID0.0101RGVI w/o Ref.
3DHQVI (480p)LPIPS0.0342RGVI
3DHQVI (480p)PSNR30.9RGVI
3DHQVI (480p)SSIM0.9513RGVI
3DHQVI (480p)VFID0.0311RGVI
3DHQVI (480p)LPIPS0.0403RGVI w/o Ref.
3DHQVI (480p)PSNR31.19RGVI w/o Ref.
3DHQVI (480p)SSIM0.9534RGVI w/o Ref.
3DHQVI (480p)VFID0.0404RGVI w/o Ref.
Video InpaintingHQVI (240p)LPIPS0.0335RGVI
Video InpaintingHQVI (240p)PSNR30.66RGVI
Video InpaintingHQVI (240p)SSIM0.9527RGVI
Video InpaintingHQVI (240p)VFID0.1825RGVI
Video InpaintingHQVI (240p)LPIPS0.039RGVI w/o Ref.
Video InpaintingHQVI (240p)PSNR31.6RGVI w/o Ref.
Video InpaintingHQVI (240p)SSIM0.9559RGVI w/o Ref.
Video InpaintingHQVI (240p)VFID0.1868RGVI w/o Ref.
Video InpaintingHQVI (2K)LPIPS0.0357RGVI
Video InpaintingHQVI (2K)PSNR30.1RGVI
Video InpaintingHQVI (2K)SSIM0.9489RGVI
Video InpaintingHQVI (2K)VFID0.0058RGVI
Video InpaintingHQVI (2K)LPIPS0.0403RGVI w/o Ref.
Video InpaintingHQVI (2K)PSNR29.81RGVI w/o Ref.
Video InpaintingHQVI (2K)SSIM0.9501RGVI w/o Ref.
Video InpaintingHQVI (2K)VFID0.0101RGVI w/o Ref.
Video InpaintingHQVI (480p)LPIPS0.0342RGVI
Video InpaintingHQVI (480p)PSNR30.9RGVI
Video InpaintingHQVI (480p)SSIM0.9513RGVI
Video InpaintingHQVI (480p)VFID0.0311RGVI
Video InpaintingHQVI (480p)LPIPS0.0403RGVI w/o Ref.
Video InpaintingHQVI (480p)PSNR31.19RGVI w/o Ref.
Video InpaintingHQVI (480p)SSIM0.9534RGVI w/o Ref.
Video InpaintingHQVI (480p)VFID0.0404RGVI w/o Ref.

Related Papers

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-10Understanding and Improving Length Generalization in Recurrent Models2025-07-03Video Virtual Try-on with Conditional Diffusion Transformer Inpainter2025-06-26Let Your Video Listen to Your Music!2025-06-23VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models2025-06-16A strengthened bound on the number of states required to characterize maximum parsimony distance2025-06-11Structured Variational $D$-Decomposition for Accurate and Stable Low-Rank Approximation2025-06-10