TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Parameter-Inverted Image Pyramid Networks

Parameter-Inverted Image Pyramid Networks

Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

2024-06-06Image ClassificationSemantic Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KValidation mIoU59.9PIIP-LH6B(UperNet)
Object DetectionCOCO test-devAP5079PIIP-H6B (DINO)
Object DetectionCOCO test-devAP7565.4PIIP-H6B (DINO)
Object DetectionCOCO test-devbox mAP60PIIP-H6B (DINO)
3DCOCO test-devAP5079PIIP-H6B (DINO)
3DCOCO test-devAP7565.4PIIP-H6B (DINO)
3DCOCO test-devbox mAP60PIIP-H6B (DINO)
2D ClassificationCOCO test-devAP5079PIIP-H6B (DINO)
2D ClassificationCOCO test-devAP7565.4PIIP-H6B (DINO)
2D ClassificationCOCO test-devbox mAP60PIIP-H6B (DINO)
2D Object DetectionCOCO test-devAP5079PIIP-H6B (DINO)
2D Object DetectionCOCO test-devAP7565.4PIIP-H6B (DINO)
2D Object DetectionCOCO test-devbox mAP60PIIP-H6B (DINO)
10-shot image generationADE20KValidation mIoU59.9PIIP-LH6B(UperNet)
16kCOCO test-devAP5079PIIP-H6B (DINO)
16kCOCO test-devAP7565.4PIIP-H6B (DINO)
16kCOCO test-devbox mAP60PIIP-H6B (DINO)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17