TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal View Synthesis of Dynamic Scenes through 3D Objec...

Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images

Nagabhushan Somraj, Pranali Sancheti, Rajiv Soundararajan

2022-08-19Video PredictionMotion EstimationTemporal View Synthesis
PaperPDFCode(official)

Abstract

The challenge of graphically rendering high frame-rate videos on low compute devices can be addressed through periodic prediction of future frames to enhance the user experience in virtual reality applications. This is studied through the problem of temporal view synthesis (TVS), where the goal is to predict the next frames of a video given the previous frames and the head poses of the previous and the next frames. In this work, we consider the TVS of dynamic scenes in which both the user and objects are moving. We design a framework that decouples the motion into user and object motion to effectively use the available user motion while predicting the next frames. We predict the motion of objects by isolating and estimating the 3D object motion in the past frames and then extrapolating it. We employ multi-plane images (MPI) as a 3D representation of the scenes and model the object motion as the 3D displacement between the corresponding points in the MPI representation. In order to handle the sparsity in MPIs while estimating the motion, we incorporate partial convolutions and masked correlation layers to estimate corresponding points. The predicted object motion is then integrated with the given user or camera motion to generate the next frame. Using a disocclusion infilling module, we synthesize the regions uncovered due to the camera and object motion. We develop a new synthetic dataset for TVS of dynamic scenes consisting of 800 videos at full HD resolution. We show through experiments on our dataset and the MPI Sintel dataset that our model outperforms all the competing methods in the literature.

Results

TaskDatasetMetricValueModel
VideoMPI SintelLPIPS0.223MCnet [villegas2017mcnet]
VideoMPI SintelPSNR24MCnet [villegas2017mcnet]
VideoMPI SintelSSIM0.7511MCnet [villegas2017mcnet]
VideoMPI SintelST-RRED5.3MCnet [villegas2017mcnet]
Video PredictionMPI SintelLPIPS0.223MCnet [villegas2017mcnet]
Video PredictionMPI SintelPSNR24MCnet [villegas2017mcnet]
Video PredictionMPI SintelSSIM0.7511MCnet [villegas2017mcnet]
Video PredictionMPI SintelST-RRED5.3MCnet [villegas2017mcnet]

Related Papers

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06-30Whole-Body Conditioned Egocentric Video Prediction2025-06-26MinD: Unified Visual Imagination and Control via Hierarchical World Models2025-06-23EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training2025-06-19AMPLIFY: Actionless Motion Priors for Robot Learning from Videos2025-06-17Uncertainty-Driven Radar-Inertial Fusion for Instantaneous 3D Ego-Velocity Estimation2025-06-17