Do Different Tracking Tasks Require Different Appearance Models?

Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H. S. Torr, Luca Bertinetto

2021-07-05NeurIPS 2021 12Visual Object Tracking Semi-Supervised Video Object Segmentation Visual Tracking Multi-Object Tracking and Segmentation Multi-Object Tracking Pose Estimation Video Object Segmentation Object Tracking Pose Tracking Pose Prediction Online Multi-Object Tracking Video Instance Segmentation Video Object Tracking Multiple People Tracking

Paper PDF Code(official)

Abstract

Tracking objects of interest in a video is one of the most popular and widely applicable problems in computer vision. However, with the years, a Cambrian explosion of use cases and benchmarks has fragmented the problem in a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now novel approaches proposed by the community are usually specialised to fit only one specific setup. To understand to what extent this specialisation is necessary, in this work we present UniTrack, a solution to address five different tasks within the same framework. UniTrack consists of a single and task-agnostic appearance model, which can be learned in a supervised or self-supervised fashion, and multiple ``heads'' that address individual tasks and do not require training. We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered. The framework also allows us to analyse appearance models obtained with the most recent self-supervised methods, thus extending their evaluation and comparison to a larger variety of important problems.

Results

Task	Dataset	Metric	Value	Model
Video	DAVIS 2017	mIoU	58.4	UniTrack
Multi-Object Tracking	MOTS20	IDF1	67.2	UniTrack
Multi-Object Tracking	MOTS20	IDs	622	UniTrack
Multi-Object Tracking	MOTS20	sMOTSA	68.9	UniTrack
Multi-Object Tracking	MOT16	IDF1	71.8	UniTrack
Multi-Object Tracking	MOT16	IDs	683	UniTrack
Multi-Object Tracking	MOT16	MOTA	74.7	UniTrack
Pose Estimation	J-HMDB	Mean PCK@0.1	58.3	UniTrack_i18
Pose Estimation	J-HMDB	Mean PCK@0.2	80.5	UniTrack_i18
Object Tracking	MOTS20	IDF1	67.2	UniTrack
Object Tracking	MOTS20	IDs	622	UniTrack
Object Tracking	MOTS20	sMOTSA	68.9	UniTrack
Object Tracking	MOT16	IDF1	71.8	UniTrack
Object Tracking	MOT16	IDs	683	UniTrack
Object Tracking	MOT16	MOTA	74.7	UniTrack
Object Tracking	OTB-2015	AUC	0.618	UniTrack_DCF
3D	J-HMDB	Mean PCK@0.1	58.3	UniTrack_i18
3D	J-HMDB	Mean PCK@0.2	80.5	UniTrack_i18
Pose Tracking	PoseTrack2018	IDF1	73.2	UniTrack
Pose Tracking	PoseTrack2018	IDs	6760	UniTrack
Pose Tracking	PoseTrack2018	MOTA	63.5	UniTrack
Video Object Segmentation	DAVIS 2017	mIoU	58.4	UniTrack
Video Instance Segmentation	YouTube-VIS validation	mask AP	30.1	UniTrack
Visual Object Tracking	OTB-2015	AUC	0.618	UniTrack_DCF
1 Image, 2*2 Stitchi	J-HMDB	Mean PCK@0.1	58.3	UniTrack_i18
1 Image, 2*2 Stitchi	J-HMDB	Mean PCK@0.2	80.5	UniTrack_i18

Do Different Tracking Tasks Require Different Appearance Models?

Abstract

Results

Related Papers

Do Different Tracking Tasks Require Different Appearance Models?

Abstract

Results

Related Papers