Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu
Error propagation is a general but crucial problem in online semi-supervised video object segmentation. We aim to suppress error propagation through a correction mechanism with high reliability. The key insight is to disentangle the correction from the conventional mask propagation process with reliable cues. We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively. Specifically, we assemble the modulators with a cascaded propagation-correction scheme. This avoids overriding the effects of the reliable correction modulator by the propagation modulator. Although the reference frame with the ground truth label provides reliable cues, it could be very different from the target frame and introduce uncertain or incomplete correlations. We augment the reference cues by supplementing reliable feature patches to a maintained pool, thus offering more comprehensive and expressive object representations to the modulators. In addition, a reliability filter is designed to retrieve reliable patches and pass them in subsequent frames. Our model achieves state-of-the-art performance on YouTube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance. Code is available at: https://github.com/JerryX1110/RPCMVOS.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | YouTube-VOS 2019 | F-Measure (Seen) | 86.9 | RPCMVOS |
| Video | YouTube-VOS 2019 | F-Measure (Unseen) | 87.1 | RPCMVOS |
| Video | YouTube-VOS 2019 | Jaccard (Seen) | 82.6 | RPCMVOS |
| Video | YouTube-VOS 2019 | Jaccard (Unseen) | 79.1 | RPCMVOS |
| Video | YouTube-VOS 2019 | Mean Jaccard & F-Measure | 83.9 | RPCMVOS |
| Video | DAVIS 2017 (test-dev) | F-measure | 82.6 | RPCMVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard | 75.8 | RPCMVOS |
| Video | DAVIS 2017 (test-dev) | Mean Jaccard & F-Measure | 79.2 | RPCMVOS |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 87.7 | RPCMVOS |
| Video | YouTube-VOS 2018 | F-Measure (Unseen) | 86.7 | RPCMVOS |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 83.1 | RPCMVOS |
| Video | YouTube-VOS 2018 | Jaccard (Unseen) | 78.5 | RPCMVOS |
| Video | YouTube-VOS 2018 | Mean Jaccard & F-Measure | 84 | RPCMVOS |
| Video | DAVIS 2017 (val) | Jaccard | 81.3 | RPCMVOS |
| Video | DAVIS 2017 (val) | Mean Jaccard & F-Measure | 83.7 | RPCMVOS |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 86 | RPCMVOS |
| Video | DAVIS 2017 (val) | J&F | 83.7 | RPCMVOS |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 81.3 | RPCMVOS |
| Video | DAVIS 2016 | F-measure (Mean) | 94 | RPCMVOS |
| Video | DAVIS 2016 | J&F | 90.6 | RPCMVOS |
| Video | DAVIS 2016 | Jaccard (Mean) | 87.1 | RPCMVOS |
| Video | YouTube-VOS 2019 | F-Measure (Seen) | 86.9 | RPCMVOS |
| Video | YouTube-VOS 2019 | F-Measure (Unseen) | 87.1 | RPCMVOS |
| Video | YouTube-VOS 2019 | Jaccard (Seen) | 82.6 | RPCMVOS |
| Video | YouTube-VOS 2019 | Jaccard (Unseen) | 79.1 | RPCMVOS |
| Video | YouTube-VOS 2019 | Overall | 83.9 | RPCMVOS |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 84.3 | RPCMVOS-Full-Res |
| Video | DAVIS 2017 (test-dev) | J&F | 81 | RPCMVOS-Full-Res |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 77.6 | RPCMVOS-Full-Res |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 82.6 | RPCMVOS |
| Video | DAVIS 2017 (test-dev) | J&F | 79.2 | RPCMVOS |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 75.8 | RPCMVOS |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 87.9 | RPCMVOS-MS |
| Video | YouTube-VOS 2018 | F-Measure (Unseen) | 86.9 | RPCMVOS-MS |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 83.3 | RPCMVOS-MS |
| Video | YouTube-VOS 2018 | Jaccard (Unseen) | 78.9 | RPCMVOS-MS |
| Video | YouTube-VOS 2018 | Overall | 84.3 | RPCMVOS-MS |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 87.7 | RPCMVOS |
| Video | YouTube-VOS 2018 | F-Measure (Unseen) | 86.7 | RPCMVOS |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 83.1 | RPCMVOS |
| Video | YouTube-VOS 2018 | Overall | 84 | RPCMVOS |
| Video | YouTube-VOS 2018 | Speed (FPS) | 78.5 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Seen) | 86.9 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Unseen) | 87.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Seen) | 82.6 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Unseen) | 79.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | Mean Jaccard & F-Measure | 83.9 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure | 82.6 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard | 75.8 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Mean Jaccard & F-Measure | 79.2 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 87.7 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 86.7 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 83.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 78.5 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Mean Jaccard & F-Measure | 84 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard | 81.3 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Mean Jaccard & F-Measure | 83.7 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 86 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 83.7 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 81.3 | RPCMVOS |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 94 | RPCMVOS |
| Video Object Segmentation | DAVIS 2016 | J&F | 90.6 | RPCMVOS |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 87.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Seen) | 86.9 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Unseen) | 87.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Seen) | 82.6 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Unseen) | 79.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2019 | Overall | 83.9 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 84.3 | RPCMVOS-Full-Res |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 81 | RPCMVOS-Full-Res |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 77.6 | RPCMVOS-Full-Res |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 82.6 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 79.2 | RPCMVOS |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 75.8 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 87.9 | RPCMVOS-MS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 86.9 | RPCMVOS-MS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 83.3 | RPCMVOS-MS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 78.9 | RPCMVOS-MS |
| Video Object Segmentation | YouTube-VOS 2018 | Overall | 84.3 | RPCMVOS-MS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 87.7 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 86.7 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 83.1 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Overall | 84 | RPCMVOS |
| Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 78.5 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 86 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 83.7 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 81.3 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 94 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 90.6 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 87.1 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Seen) | 86.9 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Unseen) | 87.1 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Seen) | 82.6 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Unseen) | 79.1 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | Overall | 83.9 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 84.3 | RPCMVOS-Full-Res |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 81 | RPCMVOS-Full-Res |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 77.6 | RPCMVOS-Full-Res |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 82.6 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 79.2 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 75.8 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 87.9 | RPCMVOS-MS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 86.9 | RPCMVOS-MS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 83.3 | RPCMVOS-MS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 78.9 | RPCMVOS-MS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Overall | 84.3 | RPCMVOS-MS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 87.7 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 86.7 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 83.1 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Overall | 84 | RPCMVOS |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 78.5 | RPCMVOS |