Jae Shin Yoon, Francois Rameau, Junsik Kim, Seokju Lee, Seunghak Shin, In So Kweon
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature compression technique that drastically reduces the memory requirements while maintaining the capability of feature representation. Two-stage training (pre-training and fine-tuning) allows our network to handle any target object regardless of its category (even if the object's type does not belong to the pre-training data) or of variations in its appearance through a video sequence. Experiments on large datasets demonstrate the effectiveness of our model - against related methods - in terms of accuracy, speed, and stability. Finally, we introduce the transferability of our network to different domains, such as the infrared data domain.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | DAVIS 2016 | F-measure (Decay) | 14.7 | PLM |
| Video | DAVIS 2016 | F-measure (Mean) | 62.5 | PLM |
| Video | DAVIS 2016 | F-measure (Recall) | 73.2 | PLM |
| Video | DAVIS 2016 | J&F | 66.35 | PLM |
| Video | DAVIS 2016 | Jaccard (Decay) | 11.2 | PLM |
| Video | DAVIS 2016 | Jaccard (Mean) | 70.2 | PLM |
| Video | DAVIS 2016 | Jaccard (Recall) | 86.3 | PLM |
| Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 14.7 | PLM |
| Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 62.5 | PLM |
| Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 73.2 | PLM |
| Video Object Segmentation | DAVIS 2016 | J&F | 66.35 | PLM |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 11.2 | PLM |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 70.2 | PLM |
| Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 86.3 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Decay) | 14.7 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Mean) | 62.5 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | F-measure (Recall) | 73.2 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | J&F | 66.35 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Decay) | 11.2 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Mean) | 70.2 | PLM |
| Semi-Supervised Video Object Segmentation | DAVIS 2016 | Jaccard (Recall) | 86.3 | PLM |