Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, Luc van Gool
This paper tackles the task of semi-supervised video object segmentation, i.e., the separation of an object from the background in a video, given the mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot). Although all frames are processed independently, the results are temporally coherent and stable. We perform experiments on two annotated video segmentation databases, which show that OSVOS is fast and improves the state of the art by a significant margin (79.8% vs 68.0%).
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | DAVIS 2017 (val) | F-measure (Decay) | 27 | OSVOS |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 63.9 | OSVOS |
| Video | DAVIS 2017 (val) | F-measure (Recall) | 73.8 | OSVOS |
| Video | DAVIS 2017 (val) | J&F | 60.25 | OSVOS |
| Video | DAVIS 2017 (val) | Jaccard (Decay) | 26.1 | OSVOS |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 56.6 | OSVOS |
| Video | DAVIS 2017 (val) | Jaccard (Recall) | 63.8 | OSVOS |
| Video | DAVIS 2016 | F-measure (Decay) | 15 | OSVOS |
| Video | DAVIS 2016 | F-measure (Mean) | 80.6 | OSVOS |
| Video | DAVIS 2016 | F-measure (Recall) | 92.6 | OSVOS |
| Video | DAVIS 2016 | J&F | 80.2 | OSVOS |
| Video | DAVIS 2016 | Jaccard (Decay) | 14.9 | OSVOS |
| Video | DAVIS 2016 | Jaccard (Mean) | 79.8 | OSVOS |
| Video | DAVIS 2016 | Jaccard (Recall) | 93.6 | OSVOS |
| Video | YouTube | mIoU | 0.783 | OSVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Decay) | 19.8 | OSVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Recall) | 59.7 | OSVOS |
| Video | DAVIS 2017 (test-dev) | J&F | 50.9 | OSVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Decay) | 19.2 | OSVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 47 | OSVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Recall) | 52.1 | OSVOS |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Video | YouTube-VOS 2018 | F-Measure (Unseen) | 60.7 | OSVOS |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 59.8 | OSVOS |
| Video | YouTube-VOS 2018 | Jaccard (Unseen) | 54.2 | OSVOS |
| Video | YouTube-VOS 2018 | Overall | 58.8 | OSVOS |
| Video | YouTube-VOS 2018 | Speed (FPS) | 0.1 | OSVOS |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Object Tracking | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Object Tracking | YouTube-VOS 2018 | F-Measure (Unseen) | 60.7 | OSVOS |
| Object Tracking | YouTube-VOS 2018 | O (Average of Measures) | 58.8 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 27 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 63.9 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 73.8 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 60.25 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 26.1 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 56.6 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 63.8 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 15 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 80.6 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 92.6 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | J&F | 80.2 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 14.9 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 79.8 | OSVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 93.6 | OSVOS |
| Video Object Segmentation | YouTube | mIoU | 0.783 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 19.8 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 59.7 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 50.9 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 19.2 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 47 | OSVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 52.1 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 60.7 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 59.8 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 54.2 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Overall | 58.8 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 0.1 | OSVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 27 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 63.9 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 73.8 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 60.25 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 26.1 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 56.6 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 63.8 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 15 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 80.6 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 92.6 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 80.2 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 14.9 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 79.8 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 93.6 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube | mIoU | 0.783 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 19.8 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 59.7 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 50.9 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 19.2 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 47 | OSVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 52.1 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 60.7 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 59.8 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 54.2 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Overall | 58.8 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 0.1 | OSVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Visual Object Tracking | YouTube-VOS 2018 | F-Measure (Seen) | 60.5 | OSVOS |
| Visual Object Tracking | YouTube-VOS 2018 | F-Measure (Unseen) | 60.7 | OSVOS |
| Visual Object Tracking | YouTube-VOS 2018 | O (Average of Measures) | 58.8 | OSVOS |