TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/InternImage-H

InternImage-H

Reported on 53 benchmarks across 9 tasks · 1 paper · 47 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Methodology25 results

  • 3DonCrowdHuman (full body)
    AP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 3DonLVIS v1.0 minival
    box AP· 2022-11-10
    65.8
    best: 72 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 3DonOpenImages-v6
    box AP· 2022-11-10
    74.1
    best: 76.2 (ScaleDet)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 3DonPASCAL VOC 2012
    MAP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 3DonCOCO minival
    box AP· uses extra data· 2022-11-10
    65
    best: 66 (PE_spatial (DETA))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 3DonLVIS v1.0 val
    box AP· uses extra data· 2022-11-10
    63.2
    best: 68 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D ClassificationonCrowdHuman (full body)
    AP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D ClassificationonLVIS v1.0 minival
    box AP· 2022-11-10
    65.8
    best: 72 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D ClassificationonOpenImages-v6
    box AP· 2022-11-10
    74.1
    best: 76.2 (ScaleDet)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D ClassificationonPASCAL VOC 2012
    MAP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D ClassificationonCOCO minival
    box AP· uses extra data· 2022-11-10
    65
    best: 66 (PE_spatial (DETA))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D ClassificationonLVIS v1.0 val
    box AP· uses extra data· 2022-11-10
    63.2
    best: 68 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononBDD100K val
    mAP· 2022-11-10
    38.8
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononCrowdHuman (full body)
    AP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononLVIS v1.0 minival
    box AP· 2022-11-10
    65.8
    best: 72 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononOpenImages-v6
    box AP· 2022-11-10
    74.1
    best: 76.2 (ScaleDet)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononPASCAL VOC 2012
    MAP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononCOCO minival
    box AP· uses extra data· 2022-11-10
    65
    best: 66 (PE_spatial (DETA))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 2D Object DetectiononLVIS v1.0 val
    box AP· uses extra data· 2022-11-10
    63.2
    best: 68 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 16konCrowdHuman (full body)
    AP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 16konLVIS v1.0 minival
    box AP· 2022-11-10
    65.8
    best: 72 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 16konOpenImages-v6
    box AP· 2022-11-10
    74.1
    best: 76.2 (ScaleDet)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 16konPASCAL VOC 2012
    MAP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 16konCOCO minival
    box AP· uses extra data· 2022-11-10
    65
    best: 66 (PE_spatial (DETA))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 16konLVIS v1.0 val
    box AP· uses extra data· 2022-11-10
    63.2
    best: 68 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778

Computer Vision18 results

  • Object DetectiononCrowdHuman (full body)
    AP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Object DetectiononLVIS v1.0 minival
    box AP· 2022-11-10
    65.8
    best: 72 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Object DetectiononOpenImages-v6
    box AP· 2022-11-10
    74.1
    best: 76.2 (ScaleDet)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Object DetectiononPASCAL VOC 2012
    MAP· 2022-11-10
    97.2
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Object DetectiononCOCO minival
    box AP· uses extra data· 2022-11-10
    65
    best: 66 (PE_spatial (DETA))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Object DetectiononLVIS v1.0 val
    box AP· uses extra data· 2022-11-10
    63.2
    best: 68 (Co-DETR (single-scale))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Image ClassificationonImageNet
    GFLOPs· 2022-11-10
    1478
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO minival
    AP50· uses extra data· 2022-11-10
    80.1
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO minival
    AP75· uses extra data· 2022-11-10
    61.5
    best: 62.8 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO minival
    APL· uses extra data· 2022-11-10
    74.4
    best: 74.6 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO minival
    APM· uses extra data· 2022-11-10
    58.4
    best: 59.7 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO minival
    APS· uses extra data· 2022-11-10
    37.9
    best: 38.9 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO minival
    mask AP· uses extra data· 2022-11-10
    55.4
    best: 56.6 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO test-dev
    AP50· uses extra data· 2022-11-10
    80.8
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO test-dev
    AP75· uses extra data· 2022-11-10
    62.2
    best: 63.4 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO test-dev
    APS· uses extra data· 2022-11-10
    41
    best: 41.6 (Co-DETR)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO test-dev
    APL· uses extra data· 2022-11-10
    70.3
    best: 72.4 (EVA)
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Instance SegmentationonCOCO test-dev
    APM· uses extra data· 2022-11-10
    58.9
    best: 60.1 (Co-DETR)
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778

Medical5 results

  • Semantic SegmentationonPASCAL Context
    mIoU· 2022-11-10
    70.3
    best: 71.1 (VPNeXt)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Semantic SegmentationonADE20K
    GFLOPs· uses extra data· 2022-11-10
    4635
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Semantic SegmentationonADE20K
    Validation mIoU· uses extra data· 2022-11-10
    62.9
    best: 63.6 (ViT-P (InternImage-H))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Semantic SegmentationonCityscapes val
    mIoU· uses extra data· 2022-11-10
    87
    best: 90.3 (EfficientPS (Cityscapes-fine))
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • Semantic SegmentationonADE20K
    Params (M)· uses extra data· 2022-11-10
    1310
    best: 3000 (FD-SwinV2-G)
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778

Audio5 results

  • 10-shot image generationonPASCAL Context
    mIoU· 2022-11-10
    70.3
    best: 71.1 (VPNeXt)
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 10-shot image generationonADE20K
    GFLOPs· uses extra data· 2022-11-10
    4635
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 10-shot image generationonADE20K
    Validation mIoU· uses extra data· 2022-11-10
    62.9
    best: 63.6 (ViT-P (InternImage-H))
    SOTA
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 10-shot image generationonCityscapes val
    mIoU· uses extra data· 2022-11-10
    87
    best: 90.3 (EfficientPS (Cityscapes-fine))
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778
  • 10-shot image generationonADE20K
    Params (M)· uses extra data· 2022-11-10
    1310
    best: 3000 (FD-SwinV2-G)
    InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsarXiv:2211.05778