TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Joint-task Self-supervised Learning for Temporal Correspon...

Joint-task Self-supervised Learning for Temporal Correspondence

Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

2019-09-26NeurIPS 2019 12Unsupervised Video Object SegmentationSemi-Supervised Video Object SegmentationSelf-Supervised LearningObject Tracking
PaperPDFCode(official)Code

Abstract

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure (Mean)61.3UVC
VideoDAVIS 2017 (val)F-measure (Recall)69.8UVC
VideoDAVIS 2017 (val)J&F59.5UVC
VideoDAVIS 2017 (val)Jaccard (Mean)57.7UVC
VideoDAVIS 2017 (val)Jaccard (Recall)68.3UVC
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)61.3UVC
Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)69.8UVC
Video Object SegmentationDAVIS 2017 (val)J&F59.5UVC
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)57.7UVC
Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)68.3UVC
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)61.3UVC
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)69.8UVC
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F59.5UVC
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)57.7UVC
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)68.3UVC

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association2025-07-16Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking2025-07-10Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking2025-07-07World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01