Linjie Yang, Yanran Wang, Xuehan Xiong, Jianchao Yang, Aggelos K. Katsaggelos
Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame. Recent deep learning based approaches find it effective by fine-tuning a general-purpose segmentation model on the annotated frame using hundreds of iterations of gradient descent. Despite the high accuracy these methods achieve, the fine-tuning process is inefficient and fail to meet the requirements of real world applications. We propose a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object. Specifically, a second meta neural network named modulator is learned to manipulate the intermediate layers of the segmentation network given limited visual and spatial information of the target object. The experiments show that our approach is 70times faster than fine-tuning approaches while achieving similar accuracy.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | DAVIS 2017 (val) | F-measure (Decay) | 24.3 | OSMN |
| Video | DAVIS 2017 (val) | F-measure (Mean) | 57.1 | OSMN |
| Video | DAVIS 2017 (val) | F-measure (Recall) | 66.1 | OSMN |
| Video | DAVIS 2017 (val) | J&F | 54.8 | OSMN |
| Video | DAVIS 2017 (val) | Jaccard (Decay) | 21.5 | OSMN |
| Video | DAVIS 2017 (val) | Jaccard (Mean) | 52.5 | OSMN |
| Video | DAVIS 2017 (val) | Jaccard (Recall) | 60.9 | OSMN |
| Video | DAVIS 2016 | F-measure (Decay) | 10.6 | OSMN |
| Video | DAVIS 2016 | F-measure (Mean) | 72.9 | OSMN |
| Video | DAVIS 2016 | F-measure (Recall) | 84 | OSMN |
| Video | DAVIS 2016 | J&F | 73.45 | OSMN |
| Video | DAVIS 2016 | Jaccard (Decay) | 9 | OSMN |
| Video | DAVIS 2016 | Jaccard (Mean) | 74 | OSMN |
| Video | DAVIS 2016 | Jaccard (Recall) | 87.6 | OSMN |
| Video | DAVIS 2017 (test-dev) | F-measure (Decay) | 17.4 | OSMN |
| Video | DAVIS 2017 (test-dev) | F-measure (Recall) | 47.4 | OSMN |
| Video | DAVIS 2017 (test-dev) | J&F | 41.3 | OSMN |
| Video | DAVIS 2017 (test-dev) | Jaccard (Decay) | 19 | OSMN |
| Video | DAVIS 2017 (test-dev) | Jaccard (Mean) | 37.7 | OSMN |
| Video | DAVIS 2017 (test-dev) | Jaccard (Recall) | 38.9 | OSMN |
| Video | YouTube-VOS 2018 | F-Measure (Seen) | 60.1 | OSMN |
| Video | YouTube-VOS 2018 | F-Measure (Unseen) | 44 | OSMN |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Video | YouTube-VOS 2018 | Jaccard (Unseen) | 40.6 | OSMN |
| Video | YouTube-VOS 2018 | Overall | 51.2 | OSMN |
| Video | YouTube-VOS 2018 | Speed (FPS) | 7.14 | OSMN |
| Video | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Object Tracking | YouTube-VOS 2018 | F-Measure (Seen) | 60.1 | OSMN |
| Object Tracking | YouTube-VOS 2018 | F-Measure (Unseen) | 44 | OSMN |
| Object Tracking | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Object Tracking | YouTube-VOS 2018 | O (Average of Measures) | 51.2 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 24.3 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 57.1 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 66.1 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | J&F | 54.8 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 21.5 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 52.5 | OSMN |
| Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 60.9 | OSMN |
| Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 10.6 | OSMN |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 72.9 | OSMN |
| Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 84 | OSMN |
| Video Object Segmentation | DAVIS 2016 | J&F | 73.45 | OSMN |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 9 | OSMN |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 74 | OSMN |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 87.6 | OSMN |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 17.4 | OSMN |
| Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 47.4 | OSMN |
| Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 41.3 | OSMN |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 19 | OSMN |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 37.7 | OSMN |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 38.9 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 60.1 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 44 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 40.6 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | Overall | 51.2 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 7.14 | OSMN |
| Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Decay) | 24.3 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Mean) | 57.1 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | F-measure (Recall) | 66.1 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | J&F | 54.8 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Decay) | 21.5 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Mean) | 52.5 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (val) | Jaccard (Recall) | 60.9 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 10.6 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 72.9 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 84 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 73.45 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 9 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 74 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 87.6 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Decay) | 17.4 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | F-measure (Recall) | 47.4 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | J&F | 41.3 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Decay) | 19 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Mean) | 37.7 | OSMN |
| Semi-Supervised Video Object Segmentation | DAVIS 2017 (test-dev) | Jaccard (Recall) | 38.9 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Seen) | 60.1 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | F-Measure (Unseen) | 44 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Unseen) | 40.6 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Overall | 51.2 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Speed (FPS) | 7.14 | OSMN |
| Semi-Supervised Video Object Segmentation | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Video Instance Segmentation | YouTube-VIS validation | AP50 | 28.6 | OSMN |
| Video Instance Segmentation | YouTube-VIS validation | AP75 | 33.1 | OSMN |
| Video Instance Segmentation | YouTube-VIS validation | mask AP | 29.1 | OSMN |
| Visual Object Tracking | YouTube-VOS 2018 | F-Measure (Seen) | 60.1 | OSMN |
| Visual Object Tracking | YouTube-VOS 2018 | F-Measure (Unseen) | 44 | OSMN |
| Visual Object Tracking | YouTube-VOS 2018 | Jaccard (Seen) | 60 | OSMN |
| Visual Object Tracking | YouTube-VOS 2018 | O (Average of Measures) | 51.2 | OSMN |