Paul Voigtlaender, Bastian Leibe
We tackle the task of semi-supervised video object segmentation, i.e. segmenting the pixels belonging to an object in the video using the ground truth pixel mask for the first frame. We build on the recently introduced one-shot video object segmentation (OSVOS) approach which uses a pretrained network and fine-tunes it on the first frame. While achieving impressive performance, at test time OSVOS uses the fine-tuned network in unchanged form and is not able to adapt to large changes in object appearance. To overcome this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS) which updates the network online using training examples selected based on the confidence of the network and the spatial configuration. Additionally, we add a pretraining step based on objectness, which is learned on PASCAL. Our experiments show that both extensions are highly effective and improve the state of the art on DAVIS to an intersection-over-union score of 85.7%.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | DAVIS 2017 (val) | F-measure (Decay) | 26.6 | OnAVOS |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 69.1 | OnAVOS |
| Video | DAVIS 2017 (val) | F-measure (Recall) | 75.4 | OnAVOS |
| Video | DAVIS 2017 (val) | J&F | 65.35 | OnAVOS |
| Video | DAVIS 2017 (val) | Jaccard (Decay) | 27.9 | OnAVOS |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 61.6 | OnAVOS |
| Video | DAVIS 2017 (val) | Jaccard (Recall) | 67.4 | OnAVOS |
| Video | DAVIS 2016 | F-measure (Decay) | 5.8 | OnAVOS |
| Video | DAVIS 2016 | F-measure (Mean) | 84.9 | OnAVOS |
| Video | DAVIS 2016 | F-measure (Recall) | 89.7 | OnAVOS |
| Video | DAVIS 2016 | J&F | 85.5 | OnAVOS |
| Video | DAVIS 2016 | Jaccard (Decay) | 5.2 | OnAVOS |
| Video | DAVIS 2016 | Jaccard (Mean) | 86.1 | OnAVOS |
| Video | DAVIS 2016 | Jaccard (Recall) | 96.1 | OnAVOS |
| Video | YouTube | mIoU | 0.774 | OnAVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Decay) | 23.4 | OnAVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Recall) | 60.3 | OnAVOS |
| Video | DAVIS 2017 (test-dev) | J&F | 52.8 | OnAVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Decay) | 23 | OnAVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 49.9 | OnAVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Recall) | 54.3 | OnAVOS |
| Object Tracking | YouTube-VOS 2018 | F-Measure (Seen) | 62.7 | OnAVOS |
| Object Tracking | YouTube-VOS 2018 | F-Measure (Unseen) | 51.4 | OnAVOS |
| Object Tracking | YouTube-VOS 2018 | Jaccard (Seen) | 60.1 | OnAVOS |
| Object Tracking | YouTube-VOS 2018 | Jaccard (Unseen) | 46.6 | OnAVOS |
| Object Tracking | YouTube-VOS 2018 | O (Average of Measures) | 55.2 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 26.6 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 69.1 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 75.4 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 65.35 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 27.9 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 61.6 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 67.4 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 5.8 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 84.9 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 89.7 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | J&F | 85.5 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 5.2 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 86.1 | OnAVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 96.1 | OnAVOS |
| Video Object Segmentation | YouTube | mIoU | 0.774 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 23.4 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 60.3 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 52.8 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 23 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 49.9 | OnAVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 54.3 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 26.6 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 69.1 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 75.4 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 65.35 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 27.9 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 61.6 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 67.4 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 5.8 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 84.9 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 89.7 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 85.5 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 5.2 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 86.1 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 96.1 | OnAVOS |
| Semi-Supervised Video Object Segmentation | YouTube | mIoU | 0.774 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 23.4 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 60.3 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 52.8 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 23 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 49.9 | OnAVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 54.3 | OnAVOS |
| Visual Object Tracking | YouTube-VOS 2018 | F-Measure (Seen) | 62.7 | OnAVOS |
| Visual Object Tracking | YouTube-VOS 2018 | F-Measure (Unseen) | 51.4 | OnAVOS |
| Visual Object Tracking | YouTube-VOS 2018 | Jaccard (Seen) | 60.1 | OnAVOS |
| Visual Object Tracking | YouTube-VOS 2018 | Jaccard (Unseen) | 46.6 | OnAVOS |
| Visual Object Tracking | YouTube-VOS 2018 | O (Average of Measures) | 55.2 | OnAVOS |