TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Staged Contact-Aware Global Human Motion Forecasting

Staged Contact-Aware Global Human Motion Forecasting

Luca Scofano, Alessio Sampieri, Elisabeth Schiele, Edoardo De Matteis, Laura Leal-Taixé, Fabio Galasso

2023-09-16Human Pose ForecastingMotion EstimationTrajectory ForecastingMotion ForecastingTrajectory Prediction
PaperPDFCode(official)

Abstract

Scene-aware global human motion forecasting is critical for manifold applications, including virtual reality, robotics, and sports. The task combines human trajectory and pose forecasting within the provided scene context, which represents a significant challenge. So far, only Mao et al. NeurIPS'22 have addressed scene-aware global motion, cascading the prediction of future scene contact points and the global motion estimation. They perform the latter as the end-to-end forecasting of future trajectories and poses. However, end-to-end contrasts with the coarse-to-fine nature of the task and it results in lower performance, as we demonstrate here empirically. We propose a STAGed contact-aware global human motion forecasting STAG, a novel three-stage pipeline for predicting global human motion in a 3D environment. We first consider the scene and the respective human interaction as contact points. Secondly, we model the human trajectory forecasting within the scene, predicting the coarse motion of the human body as a whole. The third and last stage matches a plausible fine human joint motion to complement the trajectory considering the estimated contacts. Compared to the state-of-the-art (SoA), STAG achieves a 1.8% and 16.2% overall improvement in pose and trajectory prediction, respectively, on the scene-aware GTA-IM dataset. A comprehensive ablation study confirms the advantages of staged modeling over end-to-end approaches. Furthermore, we establish the significance of a newly proposed temporal counter called the "time-to-go", which tells how long it is before reaching scene contact and endpoints. Notably, STAG showcases its ability to generalize to datasets lacking a scene and achieves a new state-of-the-art performance on CMU-Mocap, without leveraging any social cues. Our code is released at: https://github.com/L-Scofano/STAG

Results

TaskDatasetMetricValueModel
Pose EstimationGTA-IM DatasetPath Error92.3STAG
Pose EstimationGTA-IM DatasetPose Error60.3STAG
3DGTA-IM DatasetPath Error92.3STAG
3DGTA-IM DatasetPose Error60.3STAG
1 Image, 2*2 StitchiGTA-IM DatasetPath Error92.3STAG
1 Image, 2*2 StitchiGTA-IM DatasetPose Error60.3STAG

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture2025-07-09GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction2025-06-26FlightKooba: A Fast Interpretable FTP Model2025-06-24AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation2025-06-24