Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing
We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object queries. Via those, it interacts with the bottom-up pixel features iteratively with a query-based object transformer (qt, hence Cutie). The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention, Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while being three times faster. Code is available at: https://hkchengrex.github.io/Cutie
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | MOSE | J&F | 68.3 | Cutie |
| Video | M$^3$-VOS | Average IOU | 74.6 | Cutie-base |
| Video | MOSE | F | 75.8 | Cutie+ (base, MEGA) |
| Video | MOSE | FPS | 17.9 | Cutie+ (base, MEGA) |
| Video | MOSE | J | 67.6 | Cutie+ (base, MEGA) |
| Video | MOSE | J&F | 71.7 | Cutie+ (base, MEGA) |
| Video | MOSE | F | 74.5 | Cutie+ (small, MEGA) |
| Video | MOSE | FPS | 20.6 | Cutie+ (small, MEGA) |
| Video | MOSE | J | 66 | Cutie+ (small, MEGA) |
| Video | MOSE | J&F | 70.3 | Cutie+ (small, MEGA) |
| Video | MOSE | F | 74.1 | Cutie (base, MEGA) |
| Video | MOSE | FPS | 36.4 | Cutie (base, MEGA) |
| Video | MOSE | J | 65.8 | Cutie (base, MEGA) |
| Video | MOSE | J&F | 69.9 | Cutie (base, MEGA) |
| Video | MOSE | F | 72.9 | Cutie (small, MEGA) |
| Video | MOSE | FPS | 45.5 | Cutie (small, MEGA) |
| Video | MOSE | J | 64.3 | Cutie (small, MEGA) |
| Video | MOSE | J&F | 68.6 | Cutie (small, MEGA) |
| Video | MOSE | F | 72.3 | Cutie (base, with mose) |
| Video | MOSE | FPS | 36.4 | Cutie (base, with mose) |
| Video | MOSE | J | 64.2 | Cutie (base, with mose) |
| Video | MOSE | J&F | 68.3 | Cutie (base, with mose) |
| Video | MOSE | F | 71.7 | Cutie (small, with mose) |
| Video | MOSE | FPS | 45.5 | Cutie (small, with mose) |
| Video | MOSE | J | 63.1 | Cutie (small, with mose) |
| Video | MOSE | J&F | 67.4 | Cutie (small, with mose) |
| Video | MOSE | F | 67.9 | Cutie (base) |
| Video | MOSE | FPS | 36.4 | Cutie (base) |
| Video | MOSE | J | 60 | Cutie (base) |
| Video | MOSE | J&F | 64 | Cutie (base) |
| Video | MOSE | F | 66.2 | Cutie (small) |
| Video | MOSE | FPS | 45.5 | Cutie (small) |
| Video | MOSE | J | 58.2 | Cutie (small) |
| Video | MOSE | J&F | 62.2 | Cutie (small) |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 93.4 | Cutie+ (base) |
| Video | DAVIS 2017 (val) | J&F | 90.5 | Cutie+ (base) |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 87.5 | Cutie+ (base) |
| Video | DAVIS 2017 (val) | Params(M) | 17.9 | Cutie+ (base) |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 90.8 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (val) | J&F | 88.1 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 85.5 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (val) | Speed (FPS) | 17.9 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 91.1 | Cutie (base) |
| Video | DAVIS 2017 (val) | J&F | 87.9 | Cutie (base) |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 84.6 | Cutie (base) |
| Video | DAVIS 2017 (val) | Params(M) | 36.4 | Cutie (base) |
| Video | YouTube-VOS 2019 | F-Measure (Seen) | 90.6 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2019 | F-Measure (Unseen) | 90.5 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2019 | J&F | 17.9 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2019 | Jaccard (Seen) | 86.3 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2019 | Jaccard (Unseen) | 82.7 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2019 | Overall | 87.5 | Cutie+ (base, MEGA) |
| Video | BURST-test | HOTA (all) | 66 | Cutie (base, MEGA, 600 pixels) |
| Video | BURST-test | HOTA (common) | 66.5 | Cutie (base, MEGA, 600 pixels) |
| Video | BURST-test | HOTA (uncommon) | 65.9 | Cutie (base, MEGA, 600 pixels) |
| Video | BURST-test | HOTA (all) | 62.6 | Cutie (base, with mose, 600 pixels) |
| Video | BURST-test | HOTA (common) | 63.8 | Cutie (base, with mose, 600 pixels) |
| Video | BURST-test | HOTA (uncommon) | 62.3 | Cutie (base, with mose, 600 pixels) |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 91.4 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | FPS | 17.9 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | J&F | 88.1 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 84.7 | Cutie+ (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 89.9 | Cutie (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | FPS | 36.4 | Cutie (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | J&F | 86.1 | Cutie (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 82.4 | Cutie (base, MEGA) |
| Video | DAVIS 2017 (test-dev) | F-measure (Mean) | 89.2 | Cutie+ (base) |
| Video | DAVIS 2017 (test-dev) | FPS | 17.9 | Cutie+ (base) |
| Video | DAVIS 2017 (test-dev) | J&F | 85.9 | Cutie+ (base) |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 82.6 | Cutie+ (base) |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 91 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2018 | F-Measure (Unseen) | 90.1 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 86.6 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2018 | Jaccard (Unseen) | 82.2 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2018 | Overall | 87.5 | Cutie+ (base, MEGA) |
| Video | YouTube-VOS 2018 | Speed (FPS) | 17.9 | Cutie+ (base, MEGA) |
| Video | BURST-val | HOTA (all) | 61.2 | Cutie (base, MEGA, 600 pixels) |
| Video | BURST-val | HOTA (common) | 65 | Cutie (base, MEGA, 600 pixels) |
| Video | BURST-val | HOTA (uncommon) | 60.3 | Cutie (base, MEGA, 600 pixels) |
| Video | BURST-val | HOTA (all) | 58.4 | Cutie (base, with mose, 600 pixels) |
| Video | BURST-val | HOTA (common) | 61.8 | Cutie (base, with mose, 600 pixels) |
| Video | BURST-val | HOTA (uncommon) | 57.5 | Cutie (base, with mose, 600 pixels) |
| Object Tracking | DiDi | Tracking quality | 0.575 | Cutie |
| Video Object Segmentation | MOSE | J&F | 68.3 | Cutie |
| Video Object Segmentation | M$^3$-VOS | Average IOU | 74.6 | Cutie-base |
| Video Object Segmentation | MOSE | F | 75.8 | Cutie+ (base, MEGA) |
| Video Object Segmentation | MOSE | FPS | 17.9 | Cutie+ (base, MEGA) |
| Video Object Segmentation | MOSE | J | 67.6 | Cutie+ (base, MEGA) |
| Video Object Segmentation | MOSE | J&F | 71.7 | Cutie+ (base, MEGA) |
| Video Object Segmentation | MOSE | F | 74.5 | Cutie+ (small, MEGA) |
| Video Object Segmentation | MOSE | FPS | 20.6 | Cutie+ (small, MEGA) |
| Video Object Segmentation | MOSE | J | 66 | Cutie+ (small, MEGA) |
| Video Object Segmentation | MOSE | J&F | 70.3 | Cutie+ (small, MEGA) |
| Video Object Segmentation | MOSE | F | 74.1 | Cutie (base, MEGA) |
| Video Object Segmentation | MOSE | FPS | 36.4 | Cutie (base, MEGA) |
| Video Object Segmentation | MOSE | J | 65.8 | Cutie (base, MEGA) |
| Video Object Segmentation | MOSE | J&F | 69.9 | Cutie (base, MEGA) |
| Video Object Segmentation | MOSE | F | 72.9 | Cutie (small, MEGA) |
| Video Object Segmentation | MOSE | FPS | 45.5 | Cutie (small, MEGA) |
| Video Object Segmentation | MOSE | J | 64.3 | Cutie (small, MEGA) |
| Video Object Segmentation | MOSE | J&F | 68.6 | Cutie (small, MEGA) |
| Video Object Segmentation | MOSE | F | 72.3 | Cutie (base, with mose) |
| Video Object Segmentation | MOSE | FPS | 36.4 | Cutie (base, with mose) |
| Video Object Segmentation | MOSE | J | 64.2 | Cutie (base, with mose) |
| Video Object Segmentation | MOSE | J&F | 68.3 | Cutie (base, with mose) |
| Video Object Segmentation | MOSE | F | 71.7 | Cutie (small, with mose) |
| Video Object Segmentation | MOSE | FPS | 45.5 | Cutie (small, with mose) |
| Video Object Segmentation | MOSE | J | 63.1 | Cutie (small, with mose) |
| Video Object Segmentation | MOSE | J&F | 67.4 | Cutie (small, with mose) |
| Video Object Segmentation | MOSE | F | 67.9 | Cutie (base) |
| Video Object Segmentation | MOSE | FPS | 36.4 | Cutie (base) |
| Video Object Segmentation | MOSE | J | 60 | Cutie (base) |
| Video Object Segmentation | MOSE | J&F | 64 | Cutie (base) |
| Video Object Segmentation | MOSE | F | 66.2 | Cutie (small) |
| Video Object Segmentation | MOSE | FPS | 45.5 | Cutie (small) |
| Video Object Segmentation | MOSE | J | 58.2 | Cutie (small) |
| Video Object Segmentation | MOSE | J&F | 62.2 | Cutie (small) |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 93.4 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 90.5 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 87.5 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (val) | Params(M) | 17.9 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 90.8 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 88.1 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 85.5 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (val) | Speed (FPS) | 17.9 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 91.1 | Cutie (base) |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 87.9 | Cutie (base) |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 84.6 | Cutie (base) |
| Video Object Segmentation | DAVIS 2017 (val) | Params(M) | 36.4 | Cutie (base) |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Seen) | 90.6 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Unseen) | 90.5 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2019 | J&F | 17.9 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Seen) | 86.3 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Unseen) | 82.7 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2019 | Overall | 87.5 | Cutie+ (base, MEGA) |
| Video Object Segmentation | BURST-test | HOTA (all) | 66 | Cutie (base, MEGA, 600 pixels) |
| Video Object Segmentation | BURST-test | HOTA (common) | 66.5 | Cutie (base, MEGA, 600 pixels) |
| Video Object Segmentation | BURST-test | HOTA (uncommon) | 65.9 | Cutie (base, MEGA, 600 pixels) |
| Video Object Segmentation | BURST-test | HOTA (all) | 62.6 | Cutie (base, with mose, 600 pixels) |
| Video Object Segmentation | BURST-test | HOTA (common) | 63.8 | Cutie (base, with mose, 600 pixels) |
| Video Object Segmentation | BURST-test | HOTA (uncommon) | 62.3 | Cutie (base, with mose, 600 pixels) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 91.4 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | FPS | 17.9 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 88.1 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 84.7 | Cutie+ (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 89.9 | Cutie (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | FPS | 36.4 | Cutie (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 86.1 | Cutie (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 82.4 | Cutie (base, MEGA) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 89.2 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | FPS | 17.9 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 85.9 | Cutie+ (base) |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 82.6 | Cutie+ (base) |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 91 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 90.1 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 86.6 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 82.2 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2018 | Overall | 87.5 | Cutie+ (base, MEGA) |
| Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 17.9 | Cutie+ (base, MEGA) |
| Video Object Segmentation | BURST-val | HOTA (all) | 61.2 | Cutie (base, MEGA, 600 pixels) |
| Video Object Segmentation | BURST-val | HOTA (common) | 65 | Cutie (base, MEGA, 600 pixels) |
| Video Object Segmentation | BURST-val | HOTA (uncommon) | 60.3 | Cutie (base, MEGA, 600 pixels) |
| Video Object Segmentation | BURST-val | HOTA (all) | 58.4 | Cutie (base, with mose, 600 pixels) |
| Video Object Segmentation | BURST-val | HOTA (common) | 61.8 | Cutie (base, with mose, 600 pixels) |
| Video Object Segmentation | BURST-val | HOTA (uncommon) | 57.5 | Cutie (base, with mose, 600 pixels) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 75.8 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 17.9 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 67.6 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 71.7 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 74.5 | Cutie+ (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 20.6 | Cutie+ (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 66 | Cutie+ (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 70.3 | Cutie+ (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 74.1 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 36.4 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 65.8 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 69.9 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 72.9 | Cutie (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 45.5 | Cutie (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 64.3 | Cutie (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 68.6 | Cutie (small, MEGA) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 72.3 | Cutie (base, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 36.4 | Cutie (base, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 64.2 | Cutie (base, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 68.3 | Cutie (base, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 71.7 | Cutie (small, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 45.5 | Cutie (small, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 63.1 | Cutie (small, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 67.4 | Cutie (small, with mose) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 67.9 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 36.4 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 60 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 64 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | MOSE | F | 66.2 | Cutie (small) |
| Semi-Supervised Video Object Segmentation | MOSE | FPS | 45.5 | Cutie (small) |
| Semi-Supervised Video Object Segmentation | MOSE | J | 58.2 | Cutie (small) |
| Semi-Supervised Video Object Segmentation | MOSE | J&F | 62.2 | Cutie (small) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 93.4 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 90.5 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 87.5 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Params(M) | 17.9 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 90.8 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 88.1 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 85.5 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Speed (FPS) | 17.9 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 91.1 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 87.9 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 84.6 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Params(M) | 36.4 | Cutie (base) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Seen) | 90.6 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | F-Measure (Unseen) | 90.5 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | J&F | 17.9 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Seen) | 86.3 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | Jaccard (Unseen) | 82.7 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2019 | Overall | 87.5 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | BURST-test | HOTA (all) | 66 | Cutie (base, MEGA, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-test | HOTA (common) | 66.5 | Cutie (base, MEGA, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-test | HOTA (uncommon) | 65.9 | Cutie (base, MEGA, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-test | HOTA (all) | 62.6 | Cutie (base, with mose, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-test | HOTA (common) | 63.8 | Cutie (base, with mose, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-test | HOTA (uncommon) | 62.3 | Cutie (base, with mose, 600 pixels) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 91.4 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | FPS | 17.9 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 88.1 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 84.7 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 89.9 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | FPS | 36.4 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 86.1 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 82.4 | Cutie (base, MEGA) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Mean) | 89.2 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | FPS | 17.9 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 85.9 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 82.6 | Cutie+ (base) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 91 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 90.1 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 86.6 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 82.2 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Overall | 87.5 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 17.9 | Cutie+ (base, MEGA) |
| Semi-Supervised Video Object Segmentation | BURST-val | HOTA (all) | 61.2 | Cutie (base, MEGA, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-val | HOTA (common) | 65 | Cutie (base, MEGA, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-val | HOTA (uncommon) | 60.3 | Cutie (base, MEGA, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-val | HOTA (all) | 58.4 | Cutie (base, with mose, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-val | HOTA (common) | 61.8 | Cutie (base, with mose, 600 pixels) |
| Semi-Supervised Video Object Segmentation | BURST-val | HOTA (uncommon) | 57.5 | Cutie (base, with mose, 600 pixels) |
| Visual Object Tracking | DiDi | Tracking quality | 0.575 | Cutie |