TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FuseFormer: Fusing Fine-Grained Information in Transformer...

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

2021-09-07ICCV 2021 10Seeing Beyond the VisibleVideo Inpainting
PaperPDFCode(official)

Abstract

Transformer, as a strong and flexible architecture for modelling long-range relations, has been widely explored in vision tasks. However, when used in video inpainting that requires fine-grained representation, existed method still suffers from yielding blurry edges in detail due to the hard patch splitting. Here we aim to tackle this problem by proposing FuseFormer, a Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations. The soft split divides feature map into many patches with given overlapping interval. On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up. These two modules are first used in tokenization before Transformer layers and de-tokenization after Transformer layers, for effective mapping between tokens and features. Therefore, sub-patch level information interaction is enabled for more effective feature propagation between neighboring patches, resulting in synthesizing vivid content for hole regions in videos. Moreover, in FuseFormer, we elaborately insert the soft composition and soft split into the feed-forward network, enabling the 1D linear layers to have the capability of modelling 2D structure. And, the sub-patch level feature fusion ability is further enhanced. In both quantitative and qualitative evaluations, our proposed FuseFormer surpasses state-of-the-art methods. We also conduct detailed analysis to examine its superiority.

Results

TaskDatasetMetricValueModel
3DDAVISEwarp0.1362FuseFormer
3DDAVISPSNR32.54FuseFormer
3DDAVISSSIM0.97FuseFormer
3DDAVISVFID0.138FuseFormer
3DYouTube-VOS 2018Ewarp0.09FuseFormer
3DYouTube-VOS 2018PSNR33.29FuseFormer
3DYouTube-VOS 2018SSIM0.9681FuseFormer
3DYouTube-VOS 2018VFID0.053FuseFormer
3DHQVI (240p)LPIPS0.0498FuseFormer
3DHQVI (240p)PSNR29.92FuseFormer
3DHQVI (240p)SSIM0.9365FuseFormer
3DHQVI (240p)VFID0.2727FuseFormer
Video InpaintingDAVISEwarp0.1362FuseFormer
Video InpaintingDAVISPSNR32.54FuseFormer
Video InpaintingDAVISSSIM0.97FuseFormer
Video InpaintingDAVISVFID0.138FuseFormer
Video InpaintingYouTube-VOS 2018Ewarp0.09FuseFormer
Video InpaintingYouTube-VOS 2018PSNR33.29FuseFormer
Video InpaintingYouTube-VOS 2018SSIM0.9681FuseFormer
Video InpaintingYouTube-VOS 2018VFID0.053FuseFormer
Video InpaintingHQVI (240p)LPIPS0.0498FuseFormer
Video InpaintingHQVI (240p)PSNR29.92FuseFormer
Video InpaintingHQVI (240p)SSIM0.9365FuseFormer
Video InpaintingHQVI (240p)VFID0.2727FuseFormer
Seeing Beyond the VisibleKITTI360-EXAverage PSNR18.91FuseFormer

Related Papers

Video Virtual Try-on with Conditional Diffusion Transformer Inpainter2025-06-26Let Your Video Listen to Your Music!2025-06-23VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models2025-06-16Follow-Your-Creation: Empowering 4D Creation through Video Inpainting2025-06-05DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds2025-05-30Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios2025-05-14DiTPainter: Efficient Video Inpainting with Diffusion Transformers2025-04-22Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting2025-04-15