TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Cascade Mask R-CNN

Cascade Mask R-CNN

Computer VisionIntroduced 200023 papers
Source Paper

Description

Cascade Mask R-CNN extends Cascade R-CNN to instance segmentation, by adding a mask head to the cascade.

In the Mask R-CNN, the segmentation branch is inserted in parallel to the detection branch. However, the Cascade R-CNN has multiple detection branches. This raises the questions of 1) where to add the segmentation branch and 2) how many segmentation branches to add. The authors consider three strategies for mask prediction in the Cascade R-CNN. The first two strategies address the first question, adding a single mask prediction head at either the first or last stage of the Cascade R-CNN. Since the instances used to train the segmentation branch are the positives of the detection branch, their number varies in these two strategies. Placing the segmentation head later on the cascade leads to more examples. However, because segmentation is a pixel-wise operation, a large number of highly overlapping instances is not necessarily as helpful as for object detection, which is a patch-based operation. The third strategy addresses the second question, adding a segmentation branch to each cascade stage. This maximizes the diversity of samples used to learn the mask prediction task.

At inference time, all three strategies predict the segmentation masks on the patches produced by the final object detection stage, irrespective of the cascade stage on which the segmentation mask is implemented and how many segmentation branches there are.

Papers Using This Method

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety2025-04-01OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels2025-02-27Hierarchical Side-Tuning for Vision Transformers2023-10-09DMKD: Improving Feature-based Knowledge Distillation for Object Detection Via Dual Masking Augmentation2023-09-06Non-Hierarchical Transformers for Pedestrian Segmentation2023-07-11BiViT: Extremely Compressed Binary Vision Transformers2023-01-01BiViT: Extremely Compressed Binary Vision Transformer2022-11-14A Tri-Layer Plugin to Improve Occluded Detection2022-10-18FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer2021-11-27VTLayout: Fusion of Visual and Text Features for Document Layout Analysis2021-08-12K-Net: Towards Unified Image Segmentation2021-06-28TNCR: Table Net Detection and Classification Dataset2021-06-19A2-FPN: Attention Aggregation Based Feature Pyramid Network for Instance Segmentation2021-06-19A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation2021-05-07Instances as Queries2021-05-05Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN2021-01-07SCNet: Training Inference Sample Consistency for Instance Segmentation2020-12-18Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation2020-12-13CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents2020-04-27CBNet: A Novel Composite Backbone Network Architecture for Object Detection2019-09-09