TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Epipolar Transformers

Epipolar Transformers

Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu

2020-05-10CVPR 2020 63D Human Pose Estimation3D Hand Pose EstimationStereo MatchingPose Estimation3D Pose Estimation2D Pose Estimation
PaperPDFCode(official)

Abstract

A common approach to localize 3D human joints in a synchronized and calibrated multi-view setup consists of two-steps: (1) apply a 2D detector separately on each view to localize joints in 2D, and (2) perform robust triangulation on 2D detections from each view to acquire the 3D joint locations. However, in step 1, the 2D detector is limited to solving challenging cases which could potentially be better resolved in 3D, such as occlusions and oblique viewing angles, purely in 2D without leveraging any 3D information. Therefore, we propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'. Experiments on InterHand and Human3.6M show that our approach has consistent improvements over the baselines. Specifically, in the condition where no external data is used, our Human3.6M model trained with ResNet-50 backbone and image size 256 x 256 outperforms state-of-the-art by 4.23 mm and achieves MPJPE 26.9 mm.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)26.9Epipolar Transformer+R50 256×256+RPSM
HandInterHand2.6MMPJPE4.91Epipolar Transformers
Pose EstimationHuman3.6MAverage MPJPE (mm)26.9Epipolar Transformer+R50 256×256+RPSM
Pose EstimationInterHand2.6MMPJPE4.91Epipolar Transformers
Hand Pose EstimationInterHand2.6MMPJPE4.91Epipolar Transformers
3DHuman3.6MAverage MPJPE (mm)26.9Epipolar Transformer+R50 256×256+RPSM
3DInterHand2.6MMPJPE4.91Epipolar Transformers
3D Hand Pose EstimationInterHand2.6MMPJPE4.91Epipolar Transformers
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)26.9Epipolar Transformer+R50 256×256+RPSM
1 Image, 2*2 StitchiInterHand2.6MMPJPE4.91Epipolar Transformers

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16