Zongxin Yang, Yunchao Wei, Yi Yang
This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation. Different from previous practices that only explore the embedding learning using pixels from foreground object (s), we consider background should be equally treated and thus propose Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach. Our CFBI implicitly imposes the feature embedding from the target foreground object and its corresponding background to be contrastive, promoting the segmentation results accordingly. With the feature embedding from both foreground and background, our CFBI performs the matching process between the reference and the predicted sequence from both pixel and instance levels, making the CFBI be robust to various object scales. We conduct extensive experiments on three popular benchmarks, i.e., DAVIS 2016, DAVIS 2017, and YouTube-VOS. Our CFBI achieves the performance (J$F) of 89.4%, 81.9%, and 81.4%, respectively, outperforming all the other state-of-the-art methods. Code: https://github.com/z-x-yang/CFBI.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | YouTube-VOS 2019 | F-Measure (Seen) | 86.2 | CFBI+ |
| Video | YouTube-VOS 2019 | F-Measure (Unseen) | 85.2 | CFBI+ |
| Video | YouTube-VOS 2019 | Jaccard (Seen) | 81.7 | CFBI+ |
| Video | YouTube-VOS 2019 | Mean Jaccard & F-Measure | 82.6 | CFBI+ |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 85.8 | CFBI |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 81.1 | CFBI |
| Video | DAVIS 2017 (val) | Mean Jaccard & F-Measure | 81.9 | CFBI |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 84.6 | CFBI |
| Video | DAVIS 2017 (val) | J&F | 81.9 | CFBI |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 79.1 | CFBI |
| Video | DAVIS 2016 | F-measure (Mean) | 90.5 | CFBI |
| Video | DAVIS 2016 | J&F | 89.4 | CFBI |
| Video | DAVIS 2016 | Jaccard (Mean) | 88.3 | CFBI |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 78.5 | CFBI |
| Video | DAVIS 2017 (test-dev) | J&F | 74.8 | CFBI |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 71.1 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | D16 val (F) | 86.9 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | D16 val (G) | 86.1 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | D16 val (J) | 85.3 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | D17 val (F) | 77.7 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | D17 val (G) | 74.9 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | D17 val (J) | 72.1 | CFBI |
| Video | DAVIS (no YouTube-VOS training) | FPS | 5.56 | CFBI |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 85.8 | CFBI |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 81.1 | CFBI |
| Video | YouTube-VOS 2018 | Jaccard (Unseen) | 75.3 | CFBI |
| Video | YouTube-VOS 2018 | Overall | 81.4 | CFBI |
| Video | YouTube-VOS 2018 | Params(M) | 66.3 | CFBI |
| Video | YouTube-VOS 2018 | Speed (FPS) | 3.4 | CFBI |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Seen) | 86.2 | CFBI+ |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Unseen) | 85.2 | CFBI+ |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Seen) | 81.7 | CFBI+ |
| Video Object Segmentation | YouTube-VOS 2019 | Mean Jaccard & F-Measure | 82.6 | CFBI+ |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 85.8 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 81.1 | CFBI |
| Video Object Segmentation | DAVIS 2017 (val) | Mean Jaccard & F-Measure | 81.9 | CFBI |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 84.6 | CFBI |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 81.9 | CFBI |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 79.1 | CFBI |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 90.5 | CFBI |
| Video Object Segmentation | DAVIS 2016 | J&F | 89.4 | CFBI |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 88.3 | CFBI |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 78.5 | CFBI |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 74.8 | CFBI |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 71.1 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | D16 val (F) | 86.9 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | D16 val (G) | 86.1 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | D16 val (J) | 85.3 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | D17 val (F) | 77.7 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | D17 val (G) | 74.9 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | D17 val (J) | 72.1 | CFBI |
| Video Object Segmentation | DAVIS (no YouTube-VOS training) | FPS | 5.56 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 85.8 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 81.1 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 75.3 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | Overall | 81.4 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | Params(M) | 66.3 | CFBI |
| Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 3.4 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 84.6 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 81.9 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 79.1 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 90.5 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 89.4 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 88.3 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 78.5 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 74.8 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 71.1 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | D16 val (F) | 86.9 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | D16 val (G) | 86.1 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | D16 val (J) | 85.3 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | D17 val (F) | 77.7 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | D17 val (G) | 74.9 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | D17 val (J) | 72.1 | CFBI |
| Semi-Supervised Video Object Segmentation | DAVIS (no YouTube-VOS training) | FPS | 5.56 | CFBI |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 85.8 | CFBI |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 81.1 | CFBI |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 75.3 | CFBI |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Overall | 81.4 | CFBI |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Params(M) | 66.3 | CFBI |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 3.4 | CFBI |