TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BootsTAP: Bootstrapped Training for Tracking-Any-Point

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, Andrew Zisserman

2024-02-01Point Tracking
PaperPDFCodeCode(official)

Abstract

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/

Results

TaskDatasetMetricValueModel
Visual TrackingTAP-Vid-KineticsAverage Jaccard61.4BootsTAPIR
Visual TrackingTAP-Vid-KineticsAverage PCK74.2BootsTAPIR
Visual TrackingTAP-Vid-KineticsOcclusion Accuracy89.7BootsTAPIR
Visual TrackingTAP-Vid-DAVISAverage Jaccard66.2BootsTAPIR
Visual TrackingTAP-Vid-DAVISAverage PCK78.1BootsTAPIR
Visual TrackingTAP-Vid-DAVISOcclusion Accuracy91BootsTAPIR
Visual TrackingTAP-Vid-RGB-StackingAverage Jaccard72.4BootsTAPIR
Visual TrackingTAP-Vid-RGB-StackingAverage PCK83.1BootsTAPIR
Visual TrackingTAP-Vid-RGB-StackingOcclusion Accuracy91.2BootsTAPIR
Point TrackingTAP-Vid-KineticsAverage Jaccard61.4BootsTAPIR
Point TrackingTAP-Vid-KineticsAverage PCK74.2BootsTAPIR
Point TrackingTAP-Vid-KineticsOcclusion Accuracy89.7BootsTAPIR
Point TrackingTAP-Vid-DAVISAverage Jaccard66.2BootsTAPIR
Point TrackingTAP-Vid-DAVISAverage PCK78.1BootsTAPIR
Point TrackingTAP-Vid-DAVISOcclusion Accuracy91BootsTAPIR
Point TrackingTAP-Vid-RGB-StackingAverage Jaccard72.4BootsTAPIR
Point TrackingTAP-Vid-RGB-StackingAverage PCK83.1BootsTAPIR
Point TrackingTAP-Vid-RGB-StackingOcclusion Accuracy91.2BootsTAPIR

Related Papers

Integrated Switched Capacitor Array and Synchronous Charge Extraction with Adaptive Hybrid MPPT for Piezoelectric Harvesters2025-07-16SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second2025-07-14Learning to Track Any Points from Human Motion2025-07-08Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations2025-07-01DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model2025-05-18You Are Your Best Teacher: Semi-Supervised Surgical Point Tracking with Cycle-Consistent Self-Distillation2025-05-09