TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Exploring intermediate representation for monocular vehicl...

Exploring intermediate representation for monocular vehicle pose estimation

Shichao Li, Zengqiang Yan, Hongyang Li, Kwang-Ting Cheng

2020-11-17CVPR 2021 1Representation LearningVehicle Pose EstimationPose Estimation3D Pose Estimation
PaperPDFCode(official)

Abstract

We present a new learning-based framework to recover vehicle pose in SO(3) from a single RGB image. In contrast to previous works that map from local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs) to estimate egocentric vehicle orientation. This approach features a deep model that transforms perceived intensities to IGRs, which are mapped to a 3D representation encoding object orientation in the camera coordinate system. Core problems are what IGRs to use and how to learn them more effectively. We answer the former question by designing IGRs based on an interpolated cuboid that derives from primitive 3D annotation readily. The latter question motivates us to incorporate geometry knowledge with a new loss function based on a projective invariant. This loss function allows unlabeled data to be used in the training stage to improve representation learning. Without additional labels, our system outperforms previous monocular RGB-based methods for joint vehicle detection and pose estimation on the KITTI benchmark, achieving performance even comparable to stereo methods. Code and pre-trained models are available at this https URL.

Results

TaskDatasetMetricValueModel
Pose EstimationKITTIAverage Orientation Similarity89.43Ego-Net
Pose EstimationKITTI Cars HardAverage Orientation Similarity80.96Ego-Net (Monocular RGB only)
3DKITTIAverage Orientation Similarity89.43Ego-Net
3DKITTI Cars HardAverage Orientation Similarity80.96Ego-Net (Monocular RGB only)
1 Image, 2*2 StitchiKITTIAverage Orientation Similarity89.43Ego-Net
1 Image, 2*2 StitchiKITTI Cars HardAverage Orientation Similarity80.96Ego-Net (Monocular RGB only)

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17