TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learnable Gated Temporal Shift Module for Deep Video Inpai...

Learnable Gated Temporal Shift Module for Deep Video Inpainting

Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu

2019-07-02Image InpaintingVideo Inpainting
PaperPDFCode(official)Code

Abstract

How to efficiently utilize temporal information to recover videos in a consistent way is the main issue for video inpainting problems. Conventional 2D CNNs have achieved good performance on image inpainting but often lead to temporally inconsistent results where frames will flicker when applied to videos (see https://www.youtube.com/watch?v=87Vh1HDBjD0&list=PLPoVtv-xp_dL5uckIzz1PKwNjg1yI0I94&index=1); 3D CNNs can capture temporal information but are computationally intensive and hard to train. In this paper, we present a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions. LGTSM is designed to let 2D convolutions make use of neighboring frames more efficiently, which is crucial for video inpainting. Specifically, in each layer, LGTSM learns to shift some channels to its temporal neighbors so that 2D convolutions could be enhanced to handle temporal information. Meanwhile, a gated convolution is applied to the layer to identify the masked areas that are poisoning for conventional convolutions. On the FaceForensics and Free-form Video Inpainting (FVI) dataset, our model achieves state-of-the-art results with simply 33% of parameters and inference time.

Results

TaskDatasetMetricValueModel
3DDAVISEwarp0.164LGTSM
3DDAVISPSNR28.57LGTSM
3DDAVISSSIM0.9409LGTSM
3DDAVISVFID0.17LGTSM
3DYouTube-VOS 2018Ewarp0.1859LGTSM
3DYouTube-VOS 2018PSNR29.74LGTSM
3DYouTube-VOS 2018SSIM0.9504LGTSM
3DYouTube-VOS 2018VFID0.07LGTSM
Video InpaintingDAVISEwarp0.164LGTSM
Video InpaintingDAVISPSNR28.57LGTSM
Video InpaintingDAVISSSIM0.9409LGTSM
Video InpaintingDAVISVFID0.17LGTSM
Video InpaintingYouTube-VOS 2018Ewarp0.1859LGTSM
Video InpaintingYouTube-VOS 2018PSNR29.74LGTSM
Video InpaintingYouTube-VOS 2018SSIM0.9504LGTSM
Video InpaintingYouTube-VOS 2018VFID0.07LGTSM

Related Papers

RePaintGS: Reference-Guided Gaussian Splatting for Realistic and View-Consistent 3D Scene Inpainting2025-07-11MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting2025-06-30Video Virtual Try-on with Conditional Diffusion Transformer Inpainter2025-06-26Let Your Video Listen to Your Music!2025-06-233DeepRep: 3D Deep Low-rank Tensor Representation for Hyperspectral Image Inpainting2025-06-20VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models2025-06-16Geological Field Restoration through the Lens of Image Inpainting2025-06-05Follow-Your-Creation: Empowering 4D Creation through Video Inpainting2025-06-05