TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pyramidal Convolution: Rethinking Convolutional Neural Net...

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao

2020-06-20Scene ParsingImage ClassificationAction ClassificationVideo RecognitionSemantic SegmentationVideo ClassificationAction Recognitionobject-detectionObject DetectionImage Segmentation
PaperPDFCode(official)CodeCode

Abstract

This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20K valPixel Accuracy82.49PyConvSegNet-152
Semantic SegmentationADE20K valmIoU45.99PyConvSegNet-152
Semantic SegmentationADE20KTest Score56.52PyConvSegNet-152
Semantic SegmentationADE20KValidation mIoU45.99PyConvSegNet-152
10-shot image generationADE20K valPixel Accuracy82.49PyConvSegNet-152
10-shot image generationADE20K valmIoU45.99PyConvSegNet-152
10-shot image generationADE20KTest Score56.52PyConvSegNet-152
10-shot image generationADE20KValidation mIoU45.99PyConvSegNet-152

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17