TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MobileVOS: Real-Time Video Object Segmentation Contrastive...

MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

Roy Miles, Mehmet Kerim Yucel, Bruno Manganelli, Albert Saa-Garriga

2023-03-14CVPR 2023 1Semi-Supervised Video Object SegmentationRepresentation LearningSemantic SegmentationVideo Object SegmentationContrastive LearningVideo Semantic SegmentationKnowledge Distillation
PaperPDF

Abstract

This paper tackles the problem of semi-supervised video object segmentation on resource-constrained devices, such as mobile phones. We formulate this problem as a distillation task, whereby we demonstrate that small space-time-memory networks with finite memory can achieve competitive results with state of the art, but at a fraction of the computational cost (32 milliseconds per frame on a Samsung Galaxy S22). Specifically, we provide a theoretically grounded framework that unifies knowledge distillation with supervised contrastive representation learning. These models are able to jointly benefit from both pixel-wise contrastive learning and distillation from a pre-trained teacher. We validate this loss by achieving competitive J&F to state of the art on both the standard DAVIS and YouTube benchmarks, despite running up to 5x faster, and with 32x fewer parameters.

Results

TaskDatasetMetricValueModel
VideoYouTube-VOS 2019F-Measure (Seen)87.7MobileVOS
VideoYouTube-VOS 2019F-Measure (Unseen)85.3MobileVOS
VideoYouTube-VOS 2019Jaccard (Seen)83.2MobileVOS
VideoYouTube-VOS 2019Jaccard (Unseen)76.9MobileVOS
VideoYouTube-VOS 2019Mean Jaccard & F-Measure83.3MobileVOS
VideoDAVIS 2016F-Score92.6MobileVOS (val)
VideoDAVIS 2016J&F91.4MobileVOS (val)
VideoDAVIS 2016Jaccard (Mean)90.3MobileVOS (val)
VideoDAVIS 2017 (val)F-measure (Mean)88.9MobileVOS (BL30K)
VideoDAVIS 2017 (val)J&F82.3MobileVOS (BL30K)
VideoDAVIS 2017 (val)Params(M)8.1MobileVOS (BL30K)
VideoDAVIS 2017 (val)Speed (FPS)90.6MobileVOS (BL30K)
VideoDAVIS 2017 (val)F-measure (Mean)87.1MobileVOS
VideoDAVIS 2017 (val)J&F80.2MobileVOS
VideoDAVIS 2017 (val)Params(M)8.1MobileVOS
VideoDAVIS 2017 (val)Speed (FPS)90.6MobileVOS
VideoDAVIS 2016F-measure (Mean)92.6MobileVOS (BL30K)
VideoDAVIS 2016J&F91.4MobileVOS (BL30K)
VideoDAVIS 2016Jaccard (Mean)90.3MobileVOS (BL30K)
VideoDAVIS 2016Speed (FPS)100.1MobileVOS (BL30K)
VideoDAVIS 2016F-measure (Mean)91.6MobileVOS
VideoDAVIS 2016J&F90.6MobileVOS
VideoDAVIS 2016Jaccard (Mean)89.7MobileVOS
VideoDAVIS 2016Speed (FPS)100.1MobileVOS
Video Object SegmentationYouTube-VOS 2019F-Measure (Seen)87.7MobileVOS
Video Object SegmentationYouTube-VOS 2019F-Measure (Unseen)85.3MobileVOS
Video Object SegmentationYouTube-VOS 2019Jaccard (Seen)83.2MobileVOS
Video Object SegmentationYouTube-VOS 2019Jaccard (Unseen)76.9MobileVOS
Video Object SegmentationYouTube-VOS 2019Mean Jaccard & F-Measure83.3MobileVOS
Video Object SegmentationDAVIS 2016F-Score92.6MobileVOS (val)
Video Object SegmentationDAVIS 2016J&F91.4MobileVOS (val)
Video Object SegmentationDAVIS 2016Jaccard (Mean)90.3MobileVOS (val)
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)88.9MobileVOS (BL30K)
Video Object SegmentationDAVIS 2017 (val)J&F82.3MobileVOS (BL30K)
Video Object SegmentationDAVIS 2017 (val)Params(M)8.1MobileVOS (BL30K)
Video Object SegmentationDAVIS 2017 (val)Speed (FPS)90.6MobileVOS (BL30K)
Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)87.1MobileVOS
Video Object SegmentationDAVIS 2017 (val)J&F80.2MobileVOS
Video Object SegmentationDAVIS 2017 (val)Params(M)8.1MobileVOS
Video Object SegmentationDAVIS 2017 (val)Speed (FPS)90.6MobileVOS
Video Object SegmentationDAVIS 2016F-measure (Mean)92.6MobileVOS (BL30K)
Video Object SegmentationDAVIS 2016J&F91.4MobileVOS (BL30K)
Video Object SegmentationDAVIS 2016Jaccard (Mean)90.3MobileVOS (BL30K)
Video Object SegmentationDAVIS 2016Speed (FPS)100.1MobileVOS (BL30K)
Video Object SegmentationDAVIS 2016F-measure (Mean)91.6MobileVOS
Video Object SegmentationDAVIS 2016J&F90.6MobileVOS
Video Object SegmentationDAVIS 2016Jaccard (Mean)89.7MobileVOS
Video Object SegmentationDAVIS 2016Speed (FPS)100.1MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)88.9MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F82.3MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Params(M)8.1MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Speed (FPS)90.6MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)F-measure (Mean)87.1MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)J&F80.2MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Params(M)8.1MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2017 (val)Speed (FPS)90.6MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)92.6MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2016J&F91.4MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)90.3MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2016Speed (FPS)100.1MobileVOS (BL30K)
Semi-Supervised Video Object SegmentationDAVIS 2016F-measure (Mean)91.6MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2016J&F90.6MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Jaccard (Mean)89.7MobileVOS
Semi-Supervised Video Object SegmentationDAVIS 2016Speed (FPS)100.1MobileVOS

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17