TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TAPIR: Tracking Any Point with per-frame Initialization an...

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

2023-06-14ICCV 2023 1Visual TrackingPoint TrackingMotion Estimation
PaperPDFCode(official)CodeCode

Abstract

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found on our project webpage.

Results

TaskDatasetMetricValueModel
Visual TrackingDAVISAverage Jaccard61.3TAPIR (Panning MOVi-E)
Visual TrackingDAVISAverage Jaccard59.8TAPIR (MOVi-E)
Visual TrackingRGB-StackingAverage Jaccard66.2TAPIR (MOVi-E)
Visual TrackingRGB-StackingAverage Jaccard62.7TAPIR (Panning MOVi-E)
Visual TrackingKubricAverage Jaccard84.7TAPIR (Panning MOVi-E)
Visual TrackingKubricAverage Jaccard84.3TAPIR (MOVi-E)
Visual TrackingKineticsAverage Jaccard57.2TAPIR (Panning MOVi-E)
Visual TrackingKineticsAverage Jaccard57.1TAPIR (MOVi-E)

Related Papers

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17Integrated Switched Capacitor Array and Synchronous Charge Extraction with Adaptive Hybrid MPPT for Piezoelectric Harvesters2025-07-16SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second2025-07-14HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10What You Have is What You Track: Adaptive and Robust Multimodal Tracking2025-07-08Learning to Track Any Points from Human Motion2025-07-08