TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PE-former: Pose Estimation Transformer

PE-former: Pose Estimation Transformer

Paschalis Panteleris, Antonis Argyros

2021-12-09Image ClassificationPose Estimation
PaperPDFCode(official)

Abstract

Vision transformer architectures have been demonstrated to work very effectively for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction. In this paper we investigate the use of a pure transformer architecture (i.e., one with no CNN backbone) for the problem of 2D body pose estimation. We evaluate two ViT architectures on the COCO dataset. We demonstrate that using an encoder-decoder transformer architecture yields state of the art results on this estimation problem.

Results

TaskDatasetMetricValueModel
Pose EstimationCOCO (Common Objects in Context)AP72.6PEFORMER-Xcit-dino-p8
Pose EstimationCOCO (Common Objects in Context)AR79.4PEFORMER-Xcit-dino-p8
3DCOCO (Common Objects in Context)AP72.6PEFORMER-Xcit-dino-p8
3DCOCO (Common Objects in Context)AR79.4PEFORMER-Xcit-dino-p8
1 Image, 2*2 StitchiCOCO (Common Objects in Context)AP72.6PEFORMER-Xcit-dino-p8
1 Image, 2*2 StitchiCOCO (Common Objects in Context)AR79.4PEFORMER-Xcit-dino-p8

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17