TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-supervised Learning for Video Correspondence Flow

Self-supervised Learning for Video Correspondence Flow

Zihang Lai, Weidi Xie

2019-05-02Unsupervised Video Object SegmentationSemi-Supervised Video Object SegmentationSelf-Supervised LearningVideo SegmentationVideo Semantic Segmentation
PaperPDFCode(official)

Abstract

The objective of this paper is self-supervised learning of feature embeddings that are suitable for matching correspondences along the videos, which we term correspondence flow. By leveraging the natural spatial-temporal coherence in videos, we propose to train a ``pointer'' that reconstructs a target frame by copying pixels from a reference frame. We make the following contributions: First, we introduce a simple information bottleneck that forces the model to learn robust features for correspondence matching, and prevent it from learning trivial solutions, \eg matching based on low-level colour information. Second, to tackle the challenges from tracker drifting, due to complex object deformations, illumination changes and occlusions, we propose to train a recursive model over long temporal windows with scheduled sampling and cycle consistency. Third, we achieve state-of-the-art performance on DAVIS 2017 video segmentation and JHMDB keypoint tracking tasks, outperforming all previous self-supervised learning approaches by a significant margin. Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, \ie more diverse videos, further demonstrating significant improvements on video segmentation.

Results

TaskDatasetMetricValueModel
VideoDAVIS 2017 (val)F-measure (Mean)52.2CorrFlow
VideoDAVIS 2017 (val)F-measure (Recall)56CorrFlow
VideoDAVIS 2017 (val)J&F50.3CorrFlow
VideoDAVIS 2017 (val)Jaccard (Mean)48.4CorrFlow
VideoDAVIS 2017 (val)Jaccard (Recall)53.2CorrFlow
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)52.2CorrFlow
Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)56CorrFlow
Video Object SegmentationDAVIS 2017 (val)J&F50.3CorrFlow
Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)48.4CorrFlow
Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)53.2CorrFlow
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)52.2CorrFlow
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Recall)56CorrFlow
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F50.3CorrFlow
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Mean)48.4CorrFlow
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Jaccard (Recall)53.2CorrFlow

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation2025-07-13MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation2025-07-10Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01ShapeEmbed: a self-supervised learning framework for 2D contour quantification2025-07-01