Jonathon Luiten, Paul Voigtlaender, Bastian Leibe
We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposal-generation, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting and merging these proposals into accurate and temporally consistent pixel-wise object tracks over a video sequence in a way which is designed to specifically tackle the difficult challenges involved with segmenting multiple objects across a video sequence. Our approach surpasses all previous state-of-the-art results on the DAVIS 2017 video object segmentation benchmark with a J & F mean score of 71.6 on the test-dev dataset, and achieves first place in both the DAVIS 2018 Video Object Segmentation Challenge and the YouTube-VOS 1st Large-scale Video Object Segmentation Challenge.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | DAVIS 2017 (val) | F-measure (Decay) | 19.5 | PReMVOS |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 81.8 | PReMVOS |
| Video | DAVIS 2017 (val) | F-measure (Recall) | 88.9 | PReMVOS |
| Video | DAVIS 2017 (val) | J&F | 77.85 | PReMVOS |
| Video | DAVIS 2017 (val) | Jaccard (Decay) | 16.2 | PReMVOS |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 73.9 | PReMVOS |
| Video | DAVIS 2017 (val) | Jaccard (Recall) | 83.1 | PReMVOS |
| Video | DAVIS 2016 | F-measure (Decay) | 9.8 | PReMVOS |
| Video | DAVIS 2016 | F-measure (Mean) | 88.6 | PReMVOS |
| Video | DAVIS 2016 | F-measure (Recall) | 94.7 | PReMVOS |
| Video | DAVIS 2016 | J&F | 86.75 | PReMVOS |
| Video | DAVIS 2016 | Jaccard (Decay) | 8.8 | PReMVOS |
| Video | DAVIS 2016 | Jaccard (Mean) | 84.9 | PReMVOS |
| Video | DAVIS 2016 | Jaccard (Recall) | 96.1 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Decay) | 20.6 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 75.8 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Recall) | 84.3 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | J&F | 71.6 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Decay) | 21.7 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 67.5 | PReMVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Recall) | 76.8 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 19.5 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 81.8 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 88.9 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 77.85 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 16.2 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 73.9 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 83.1 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 9.8 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 88.6 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 94.7 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | J&F | 86.75 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 8.8 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 84.9 | PReMVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 96.1 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 20.6 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 75.8 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 84.3 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 71.6 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 21.7 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 67.5 | PReMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 76.8 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 19.5 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 81.8 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 88.9 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 77.85 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 16.2 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 73.9 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 83.1 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 9.8 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 88.6 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 94.7 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 86.75 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 8.8 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 84.9 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 96.1 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 20.6 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 75.8 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 84.3 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 71.6 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 21.7 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 67.5 | PReMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 76.8 | PReMVOS |