BootsTAP: Bootstrapped Training for Tracking-Any-Point

Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, Andrew Zisserman

2024-02-01Point Tracking

Paper PDF Code Code(official)

Abstract

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/

Results

Task	Dataset	Metric	Value	Model
Visual Tracking	TAP-Vid-Kinetics	Average Jaccard	61.4	BootsTAPIR
Visual Tracking	TAP-Vid-Kinetics	Average PCK	74.2	BootsTAPIR
Visual Tracking	TAP-Vid-Kinetics	Occlusion Accuracy	89.7	BootsTAPIR
Visual Tracking	TAP-Vid-DAVIS	Average Jaccard	66.2	BootsTAPIR
Visual Tracking	TAP-Vid-DAVIS	Average PCK	78.1	BootsTAPIR
Visual Tracking	TAP-Vid-DAVIS	Occlusion Accuracy	91	BootsTAPIR
Visual Tracking	TAP-Vid-RGB-Stacking	Average Jaccard	72.4	BootsTAPIR
Visual Tracking	TAP-Vid-RGB-Stacking	Average PCK	83.1	BootsTAPIR
Visual Tracking	TAP-Vid-RGB-Stacking	Occlusion Accuracy	91.2	BootsTAPIR
Point Tracking	TAP-Vid-Kinetics	Average Jaccard	61.4	BootsTAPIR
Point Tracking	TAP-Vid-Kinetics	Average PCK	74.2	BootsTAPIR
Point Tracking	TAP-Vid-Kinetics	Occlusion Accuracy	89.7	BootsTAPIR
Point Tracking	TAP-Vid-DAVIS	Average Jaccard	66.2	BootsTAPIR
Point Tracking	TAP-Vid-DAVIS	Average PCK	78.1	BootsTAPIR
Point Tracking	TAP-Vid-DAVIS	Occlusion Accuracy	91	BootsTAPIR
Point Tracking	TAP-Vid-RGB-Stacking	Average Jaccard	72.4	BootsTAPIR
Point Tracking	TAP-Vid-RGB-Stacking	Average PCK	83.1	BootsTAPIR
Point Tracking	TAP-Vid-RGB-Stacking	Occlusion Accuracy	91.2	BootsTAPIR

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Abstract

Results

Related Papers

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Abstract

Results

Related Papers