TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ConvMLP: Hierarchical Convolutional MLPs for Vision

ConvMLP: Hierarchical Convolutional MLPs for Vision

Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi

2021-09-09Image ClassificationSemantic SegmentationInstance Segmentationobject-detectionObject Detection
PaperPDFCode(official)CodeCodeCode

Abstract

MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy computation. To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs. In particular, ConvMLP-S achieves 76.8% top-1 accuracy on ImageNet-1k with 9M parameters and 2.4G MACs (15% and 19% of MLP-Mixer-B/16, respectively). Experiments on object detection and semantic segmentation further show that visual representation learned by ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Convolutional-MLPs.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KValidation mIoU40ConvMLP-L
Semantic SegmentationADE20KValidation mIoU38.6ConvMLP-M
Semantic SegmentationADE20KValidation mIoU35.8ConvMLP-S
Image ClassificationCIFAR-10Percentage correct98.6ConvMLP-M
Image ClassificationCIFAR-10Percentage correct98.6ConvMLP-L
Image ClassificationCIFAR-10Percentage correct98ConvMLP-S
Image ClassificationFlowers-102Accuracy99.5ConvMLP-S
Image ClassificationFlowers-102Accuracy99.5ConvMLP-L
Image ClassificationCIFAR-100Percentage correct89.1ConvMLP-M
Image ClassificationCIFAR-100Percentage correct88.6ConvMLP-L
Image ClassificationCIFAR-100Percentage correct87.4ConvMLP-S
Image ClassificationImageNetTop 1 Accuracy76.8ConvMLP-S
10-shot image generationADE20KValidation mIoU40ConvMLP-L
10-shot image generationADE20KValidation mIoU38.6ConvMLP-M
10-shot image generationADE20KValidation mIoU35.8ConvMLP-S

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17