TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/InternImage: Exploring Large-Scale Vision Foundation Model...

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

2022-11-10CVPR 2023 1Image ClassificationSemantic SegmentationInstance Segmentation2D Object DetectionClassificationObject Detection
PaperPDFCode(official)CodeCode

Abstract

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs. The code will be released at https://github.com/OpenGVLab/InternImage.

Results

TaskDatasetMetricValueModel
Semantic SegmentationReplicamIoU38.4InternImage
Semantic SegmentationCityscapes valmIoU87InternImage-H
Semantic SegmentationCityscapes valmIoU86.4InternImage-XL
Semantic SegmentationPASCAL ContextmIoU70.3InternImage-H
Semantic SegmentationADE20KGFLOPs4635InternImage-H
Semantic SegmentationADE20KParams (M)1310InternImage-H
Semantic SegmentationADE20KValidation mIoU62.9InternImage-H
Semantic SegmentationADE20KGFLOPs3142InternImage-XL
Semantic SegmentationADE20KParams (M)368InternImage-XL
Semantic SegmentationADE20KValidation mIoU55.3InternImage-XL
Semantic SegmentationADE20KGFLOPs2526InternImage-L
Semantic SegmentationADE20KParams (M)256InternImage-L
Semantic SegmentationADE20KValidation mIoU54.1InternImage-L
Semantic SegmentationADE20KGFLOPs1185InternImage-B
Semantic SegmentationADE20KParams (M)128InternImage-B
Semantic SegmentationADE20KValidation mIoU51.3InternImage-B
Semantic SegmentationADE20KGFLOPs1017InternImage-S
Semantic SegmentationADE20KParams (M)80InternImage-S
Semantic SegmentationADE20KValidation mIoU50.9InternImage-S
Semantic SegmentationADE20KGFLOPs944InternImage-T
Semantic SegmentationADE20KParams (M)59InternImage-T
Semantic SegmentationADE20KValidation mIoU48.1InternImage-T
Semantic SegmentationADE20KParams (M)1310InternImage-H (M3I Pre-training)
Object DetectionCrowdHuman (full body)AP97.2InternImage-H
Object DetectionLVIS v1.0 minivalbox AP65.8InternImage-H
Object DetectionCOCO test-devParams (M)2180InternImage-H (M3I Pre-training)
Object DetectionCOCO test-devbox mAP65.5InternImage-H (M3I Pre-training)
Object DetectionCOCO test-devParams (M)602InternImage-XL
Object DetectionCOCO test-devbox mAP64.3InternImage-XL
Object DetectionCOCO-OAverage mAP37InternImage-L (Cascade Mask R-CNN)
Object DetectionCOCO-OEffective Robustness11.72InternImage-L (Cascade Mask R-CNN)
Object DetectionOpenImages-v6box AP74.1InternImage-H
Object DetectionPASCAL VOC 2012MAP97.2InternImage-H
Object DetectionCOCO minivalbox AP65InternImage-H
Object DetectionCOCO minivalbox AP64.2InternImage-XL
Object DetectionLVIS v1.0 valbox AP63.2InternImage-H
Image ClassificationImageNetGFLOPs1478InternImage-H
Image ClassificationImageNetGFLOPs163InternImage-XL
Image ClassificationImageNetGFLOPs108InternImage-L
Image ClassificationImageNetGFLOPs16InternImage-B
Image ClassificationImageNetGFLOPs8InternImage-S
3DCrowdHuman (full body)AP97.2InternImage-H
3DLVIS v1.0 minivalbox AP65.8InternImage-H
3DCOCO test-devParams (M)2180InternImage-H (M3I Pre-training)
3DCOCO test-devbox mAP65.5InternImage-H (M3I Pre-training)
3DCOCO test-devParams (M)602InternImage-XL
3DCOCO test-devbox mAP64.3InternImage-XL
3DCOCO-OAverage mAP37InternImage-L (Cascade Mask R-CNN)
3DCOCO-OEffective Robustness11.72InternImage-L (Cascade Mask R-CNN)
3DOpenImages-v6box AP74.1InternImage-H
3DPASCAL VOC 2012MAP97.2InternImage-H
3DCOCO minivalbox AP65InternImage-H
3DCOCO minivalbox AP64.2InternImage-XL
3DLVIS v1.0 valbox AP63.2InternImage-H
Instance SegmentationCOCO minivalAP5080.1InternImage-H
Instance SegmentationCOCO minivalAP7561.5InternImage-H
Instance SegmentationCOCO minivalAPL74.4InternImage-H
Instance SegmentationCOCO minivalAPM58.4InternImage-H
Instance SegmentationCOCO minivalAPS37.9InternImage-H
Instance SegmentationCOCO minivalmask AP55.4InternImage-H
Instance SegmentationCOCO minivalGFLOPs1782InternImage-XL
Instance SegmentationCOCO minivalParams (M)387InternImage-XL
Instance SegmentationCOCO minivalmask AP48.8InternImage-XL
Instance SegmentationCOCO minivalGFLOPs1399InternImage-L
Instance SegmentationCOCO minivalParams (M)277InternImage-L
Instance SegmentationCOCO minivalbox AP56.1InternImage-L
Instance SegmentationCOCO minivalmask AP48.5InternImage-L
Instance SegmentationCOCO minivalGFLOPs340InternImage-S
Instance SegmentationCOCO minivalParams (M)69InternImage-S
Instance SegmentationCOCO minivalbox AP49.7InternImage-S
Instance SegmentationCOCO minivalmask AP44.5InternImage-S
Instance SegmentationCOCO minivalGFLOPs270InternImage-T
Instance SegmentationCOCO minivalParams (M)49InternImage-T
Instance SegmentationCOCO minivalbox AP49.1InternImage-T
Instance SegmentationCOCO minivalmask AP43.7InternImage-T
Instance SegmentationCOCO minivalGFLOPs501InternImage-B
Instance SegmentationCOCO minivalParams (M)115InternImage-B
Instance SegmentationCOCO test-devAP5080.8InternImage-H
Instance SegmentationCOCO test-devAP7562.2InternImage-H
Instance SegmentationCOCO test-devAPL70.3InternImage-H
Instance SegmentationCOCO test-devAPM58.9InternImage-H
Instance SegmentationCOCO test-devAPS41InternImage-H
2D ClassificationCrowdHuman (full body)AP97.2InternImage-H
2D ClassificationLVIS v1.0 minivalbox AP65.8InternImage-H
2D ClassificationCOCO test-devParams (M)2180InternImage-H (M3I Pre-training)
2D ClassificationCOCO test-devbox mAP65.5InternImage-H (M3I Pre-training)
2D ClassificationCOCO test-devParams (M)602InternImage-XL
2D ClassificationCOCO test-devbox mAP64.3InternImage-XL
2D ClassificationCOCO-OAverage mAP37InternImage-L (Cascade Mask R-CNN)
2D ClassificationCOCO-OEffective Robustness11.72InternImage-L (Cascade Mask R-CNN)
2D ClassificationOpenImages-v6box AP74.1InternImage-H
2D ClassificationPASCAL VOC 2012MAP97.2InternImage-H
2D ClassificationCOCO minivalbox AP65InternImage-H
2D ClassificationCOCO minivalbox AP64.2InternImage-XL
2D ClassificationLVIS v1.0 valbox AP63.2InternImage-H
2D Object DetectionBDD100K valmAP38.8InternImage-H
2D Object DetectionCrowdHuman (full body)AP97.2InternImage-H
2D Object DetectionLVIS v1.0 minivalbox AP65.8InternImage-H
2D Object DetectionCOCO test-devParams (M)2180InternImage-H (M3I Pre-training)
2D Object DetectionCOCO test-devbox mAP65.5InternImage-H (M3I Pre-training)
2D Object DetectionCOCO test-devParams (M)602InternImage-XL
2D Object DetectionCOCO test-devbox mAP64.3InternImage-XL
2D Object DetectionCOCO-OAverage mAP37InternImage-L (Cascade Mask R-CNN)
2D Object DetectionCOCO-OEffective Robustness11.72InternImage-L (Cascade Mask R-CNN)
2D Object DetectionOpenImages-v6box AP74.1InternImage-H
2D Object DetectionPASCAL VOC 2012MAP97.2InternImage-H
2D Object DetectionCOCO minivalbox AP65InternImage-H
2D Object DetectionCOCO minivalbox AP64.2InternImage-XL
2D Object DetectionLVIS v1.0 valbox AP63.2InternImage-H
10-shot image generationReplicamIoU38.4InternImage
10-shot image generationCityscapes valmIoU87InternImage-H
10-shot image generationCityscapes valmIoU86.4InternImage-XL
10-shot image generationPASCAL ContextmIoU70.3InternImage-H
10-shot image generationADE20KGFLOPs4635InternImage-H
10-shot image generationADE20KParams (M)1310InternImage-H
10-shot image generationADE20KValidation mIoU62.9InternImage-H
10-shot image generationADE20KGFLOPs3142InternImage-XL
10-shot image generationADE20KParams (M)368InternImage-XL
10-shot image generationADE20KValidation mIoU55.3InternImage-XL
10-shot image generationADE20KGFLOPs2526InternImage-L
10-shot image generationADE20KParams (M)256InternImage-L
10-shot image generationADE20KValidation mIoU54.1InternImage-L
10-shot image generationADE20KGFLOPs1185InternImage-B
10-shot image generationADE20KParams (M)128InternImage-B
10-shot image generationADE20KValidation mIoU51.3InternImage-B
10-shot image generationADE20KGFLOPs1017InternImage-S
10-shot image generationADE20KParams (M)80InternImage-S
10-shot image generationADE20KValidation mIoU50.9InternImage-S
10-shot image generationADE20KGFLOPs944InternImage-T
10-shot image generationADE20KParams (M)59InternImage-T
10-shot image generationADE20KValidation mIoU48.1InternImage-T
10-shot image generationADE20KParams (M)1310InternImage-H (M3I Pre-training)
16kCrowdHuman (full body)AP97.2InternImage-H
16kLVIS v1.0 minivalbox AP65.8InternImage-H
16kCOCO test-devParams (M)2180InternImage-H (M3I Pre-training)
16kCOCO test-devbox mAP65.5InternImage-H (M3I Pre-training)
16kCOCO test-devParams (M)602InternImage-XL
16kCOCO test-devbox mAP64.3InternImage-XL
16kCOCO-OAverage mAP37InternImage-L (Cascade Mask R-CNN)
16kCOCO-OEffective Robustness11.72InternImage-L (Cascade Mask R-CNN)
16kOpenImages-v6box AP74.1InternImage-H
16kPASCAL VOC 2012MAP97.2InternImage-H
16kCOCO minivalbox AP65InternImage-H
16kCOCO minivalbox AP64.2InternImage-XL
16kLVIS v1.0 valbox AP63.2InternImage-H

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17