Implicit Motion Handling for Video Camouflaged Object Detection

Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, ZongYuan Ge

2022-03-14CVPR 2022 1Motion Estimation Camouflaged Object Segmentation Segmentation Semantic Segmentation object-detection Object Detection

Paper PDF Code

Abstract

We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames. An essential property of camouflaged objects is that they usually exhibit patterns similar to the background and thus make them hard to identify from still images. Therefore, effectively handling temporal dynamics in videos becomes the key for the VCOD task as the camouflaged objects will be noticeable when they move. However, current VCOD methods often leverage homography or optical flows to represent motions, where the detection error may accumulate from both the motion estimation error and the segmentation error. On the other hand, our method unifies motion estimation and object segmentation within a single optimization framework. Specifically, we build a dense correlation volume to implicitly capture motions between neighbouring frames and utilize the final segmentation supervision to optimize the implicit motion estimation and segmentation jointly. Furthermore, to enforce temporal consistency within a video sequence, we jointly utilize a spatio-temporal transformer to refine the short-term predictions. Extensive experiments on VCOD benchmarks demonstrate the architectural effectiveness of our approach. We also provide a large-scale VCOD dataset named MoCA-Mask with pixel-level handcrafted ground-truth masks and construct a comprehensive VCOD benchmark with previous methods to facilitate research in this direction. Dataset Link: https://xueliancheng.github.io/SLT-Net-project.

Results

Task	Dataset	Metric	Value	Model
Object Detection	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
Object Detection	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
Object Detection	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
Object Detection	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
Object Detection	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
Object Detection	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
Object Detection	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
Object Detection	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
Object Detection	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
Object Detection	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5
3D	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
3D	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
3D	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
3D	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
3D	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
3D	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
3D	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
3D	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
3D	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
3D	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
Camouflaged Object Segmentation	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5
Object Segmentation	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
Object Segmentation	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
Object Segmentation	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
Object Segmentation	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
Object Segmentation	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
Object Segmentation	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
Object Segmentation	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
Object Segmentation	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
Object Segmentation	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
Object Segmentation	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5
2D Classification	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
2D Classification	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
2D Classification	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
2D Classification	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
2D Classification	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
2D Classification	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
2D Classification	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
2D Classification	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
2D Classification	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
2D Classification	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5
2D Object Detection	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
2D Object Detection	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
2D Object Detection	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
2D Object Detection	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
2D Object Detection	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
2D Object Detection	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
2D Object Detection	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
2D Object Detection	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
2D Object Detection	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
2D Object Detection	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5
16k	MoCA-Mask	MAE	0.027	STL-Net-LT-PVTv2-B5
16k	MoCA-Mask	S-measure	0.631	STL-Net-LT-PVTv2-B5
16k	MoCA-Mask	mDice	0.36	STL-Net-LT-PVTv2-B5
16k	MoCA-Mask	mIoU	0.272	STL-Net-LT-PVTv2-B5
16k	MoCA-Mask	weighted F-measure	0.311	STL-Net-LT-PVTv2-B5
16k	Camouflaged Animal Dataset	MAE	0.03	STL-Net-LT-PVTv2-B5
16k	Camouflaged Animal Dataset	S-measure	0.696	STL-Net-LT-PVTv2-B5
16k	Camouflaged Animal Dataset	mDice	0.493	STL-Net-LT-PVTv2-B5
16k	Camouflaged Animal Dataset	mIoU	0.402	STL-Net-LT-PVTv2-B5
16k	Camouflaged Animal Dataset	weighted F-measure	0.481	STL-Net-LT-PVTv2-B5

Implicit Motion Handling for Video Camouflaged Object Detection

Abstract

Results

Related Papers

Implicit Motion Handling for Video Camouflaged Object Detection

Abstract

Results

Related Papers