TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MogaNet: Multi-order Gated Aggregation Network

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di wu, ZhiYuan Chen, Jiangbin Zheng, Stan Z. Li

2022-11-073D Human Pose EstimationImage ClassificationRepresentation LearningVideo PredictionSemantic SegmentationPose EstimationInstance Segmentationobject-detectionObject Detection
PaperPDFCodeCode(official)CodeCodeCodeCode(official)Code(official)

Abstract

By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on \textit{multi-order game-theoretic interaction} within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D\&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0\% and 87.8\% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59\% FLOPs and 17M parameters, respectively. The source code is available at \url{https://github.com/Westlake-AI/MogaNet}.

Results

TaskDatasetMetricValueModel
VideoMoving MNISTMAE51.84MogaNet (SimVP 10x)
VideoMoving MNISTMSE15.67MogaNet (SimVP 10x)
VideoMoving MNISTSSIM0.9661MogaNet (SimVP 10x)
VideoMoving MNISTMAE53.57VAN (SimVP 10x)
VideoMoving MNISTMSE16.21VAN (SimVP 10x)
VideoMoving MNISTSSIM0.9646VAN (SimVP 10x)
VideoMoving MNISTMAE55.7HorNet (SimVP 10x)
VideoMoving MNISTMSE17.4HorNet (SimVP 10x)
VideoMoving MNISTSSIM0.9624HorNet (SimVP 10x)
VideoMoving MNISTMAE55.76ConvNeXt (SimVP 10x)
VideoMoving MNISTMSE17.58ConvNeXt (SimVP 10x)
VideoMoving MNISTSSIM0.9617ConvNeXt (SimVP 10x)
VideoMoving MNISTMAE57.52Uniformer (SimVP 10x)
VideoMoving MNISTMSE18.01Uniformer (SimVP 10x)
VideoMoving MNISTMAE59.86MLP-Mixer (SimVP 10x)
VideoMoving MNISTMSE18.85MLP-Mixer (SimVP 10x)
VideoMoving MNISTMAE59.84Swin (SimVP 10x)
VideoMoving MNISTMSE19.11Swin (SimVP 10x)
VideoMoving MNISTMAE61.65ViT (SimVP 10x)
VideoMoving MNISTMSE19.74ViT (SimVP 10x)
VideoMoving MNISTSSIM0.9539ViT (SimVP 10x)
VideoMoving MNISTMAE64.31Poolformer (SimVP 10x)
VideoMoving MNISTMSE20.96Poolformer (SimVP 10x)
VideoMoving MNISTMAE67.37ConvMixer (SimVP 10x)
VideoMoving MNISTMSE22.3ConvMixer (SimVP 10x)
Semantic SegmentationADE20KValidation mIoU54MogaNet-XL (UperNet)
Semantic SegmentationADE20KGFLOPs (512 x 512)1176MogaNet-L (UperNet)
Semantic SegmentationADE20KValidation mIoU50.9MogaNet-L (UperNet)
Semantic SegmentationADE20KGFLOPs (512 x 512)1050MogaNet-B (UperNet)
Semantic SegmentationADE20KValidation mIoU50.1MogaNet-B (UperNet)
Semantic SegmentationADE20KGFLOPs (512 x 512)946MogaNet-S (UperNet)
Semantic SegmentationADE20KValidation mIoU49.2MogaNet-S (UperNet)
Semantic SegmentationADE20KGFLOPs (512 x 512)189MogaNet-S (Semantic FPN)
Semantic SegmentationADE20KValidation mIoU47.7MogaNet-S (Semantic FPN)
Pose EstimationCOCO val2017AP77.3MogaNet-B (384x288)
Pose EstimationCOCO val2017AP5091.4MogaNet-B (384x288)
Pose EstimationCOCO val2017AP7584MogaNet-B (384x288)
Pose EstimationCOCO val2017AR82.2MogaNet-B (384x288)
Pose EstimationCOCO val2017AP76.4MogaNet-S (384x288)
Pose EstimationCOCO val2017AP5091MogaNet-S (384x288)
Pose EstimationCOCO val2017AP7583.3MogaNet-S (384x288)
Pose EstimationCOCO val2017AR81.4MogaNet-S (384x288)
Pose EstimationCOCO val2017AP74.9MogaNet-S (256x192)
Pose EstimationCOCO val2017AR80.1MogaNet-S (256x192)
Pose EstimationCOCO val2017AP73.2MogaNet-T (256x192)
Pose EstimationCOCO val2017AP5090.1MogaNet-T (256x192)
Pose EstimationCOCO val2017AP7581MogaNet-T (256x192)
Pose EstimationCOCO val2017AR78.8MogaNet-T (256x192)
Object DetectionCOCO 2017 valAP56.2MogaNet-XL (Cascade Mask R-CNN)
Object DetectionCOCO 2017 valAP53.3MogaNet-L (Cascade Mask R-CNN)
Object DetectionCOCO 2017 valAP52.6MogaNet-B (Cascade Mask R-CNN)
Object DetectionCOCO 2017 valAP51.6MogaNet-S (Cascade Mask R-CNN)
Object DetectionCOCO 2017 valAP49.4MogaNet-L (Mask R-CNN 1x)
Object DetectionCOCO 2017 valAP48.7MogaNet-L (RetinaNet 1x)
Object DetectionCOCO 2017 valAP47.9MogaNet-B (Mask R-CNN 1x)
Object DetectionCOCO 2017 valAP47.7MogaNet-B (RetinaNet 1x)
Object DetectionCOCO 2017 valAP46.7MogaNet-S (Mask R-CNN 1x)
Object DetectionCOCO 2017 valAP45.8MogaNet-S (RetinaNet 1x)
Object DetectionCOCO 2017 valAP42.6MogaNet-T (Mask R-CNN 1x)
Object DetectionCOCO 2017 valAP41.4MogaNet-T (RetinaNet 1x)
Object DetectionCOCO 2017 valAP40.7MogaNet-XT (Mask R-CNN 1x)
Object DetectionCOCO 2017 valAP39.7MogaNet-XT (RetinaNet 1x)
Image ClassificationImageNetGFLOPs102MogaNet-XL (384res)
Image ClassificationImageNetGFLOPs15.9MogaNet-L
Image ClassificationImageNetGFLOPs9.9MogaNet-B
Image ClassificationImageNetGFLOPs5MogaNet-S
Image ClassificationImageNetGFLOPs1.44MogaNet-T (256res)
Image ClassificationImageNetGFLOPs1.04MogaNet-XT (256res)
Video PredictionMoving MNISTMAE51.84MogaNet (SimVP 10x)
Video PredictionMoving MNISTMSE15.67MogaNet (SimVP 10x)
Video PredictionMoving MNISTSSIM0.9661MogaNet (SimVP 10x)
Video PredictionMoving MNISTMAE53.57VAN (SimVP 10x)
Video PredictionMoving MNISTMSE16.21VAN (SimVP 10x)
Video PredictionMoving MNISTSSIM0.9646VAN (SimVP 10x)
Video PredictionMoving MNISTMAE55.7HorNet (SimVP 10x)
Video PredictionMoving MNISTMSE17.4HorNet (SimVP 10x)
Video PredictionMoving MNISTSSIM0.9624HorNet (SimVP 10x)
Video PredictionMoving MNISTMAE55.76ConvNeXt (SimVP 10x)
Video PredictionMoving MNISTMSE17.58ConvNeXt (SimVP 10x)
Video PredictionMoving MNISTSSIM0.9617ConvNeXt (SimVP 10x)
Video PredictionMoving MNISTMAE57.52Uniformer (SimVP 10x)
Video PredictionMoving MNISTMSE18.01Uniformer (SimVP 10x)
Video PredictionMoving MNISTMAE59.86MLP-Mixer (SimVP 10x)
Video PredictionMoving MNISTMSE18.85MLP-Mixer (SimVP 10x)
Video PredictionMoving MNISTMAE59.84Swin (SimVP 10x)
Video PredictionMoving MNISTMSE19.11Swin (SimVP 10x)
Video PredictionMoving MNISTMAE61.65ViT (SimVP 10x)
Video PredictionMoving MNISTMSE19.74ViT (SimVP 10x)
Video PredictionMoving MNISTSSIM0.9539ViT (SimVP 10x)
Video PredictionMoving MNISTMAE64.31Poolformer (SimVP 10x)
Video PredictionMoving MNISTMSE20.96Poolformer (SimVP 10x)
Video PredictionMoving MNISTMAE67.37ConvMixer (SimVP 10x)
Video PredictionMoving MNISTMSE22.3ConvMixer (SimVP 10x)
3DCOCO 2017 valAP56.2MogaNet-XL (Cascade Mask R-CNN)
3DCOCO 2017 valAP53.3MogaNet-L (Cascade Mask R-CNN)
3DCOCO 2017 valAP52.6MogaNet-B (Cascade Mask R-CNN)
3DCOCO 2017 valAP51.6MogaNet-S (Cascade Mask R-CNN)
3DCOCO 2017 valAP49.4MogaNet-L (Mask R-CNN 1x)
3DCOCO 2017 valAP48.7MogaNet-L (RetinaNet 1x)
3DCOCO 2017 valAP47.9MogaNet-B (Mask R-CNN 1x)
3DCOCO 2017 valAP47.7MogaNet-B (RetinaNet 1x)
3DCOCO 2017 valAP46.7MogaNet-S (Mask R-CNN 1x)
3DCOCO 2017 valAP45.8MogaNet-S (RetinaNet 1x)
3DCOCO 2017 valAP42.6MogaNet-T (Mask R-CNN 1x)
3DCOCO 2017 valAP41.4MogaNet-T (RetinaNet 1x)
3DCOCO 2017 valAP40.7MogaNet-XT (Mask R-CNN 1x)
3DCOCO 2017 valAP39.7MogaNet-XT (RetinaNet 1x)
3DCOCO val2017AP77.3MogaNet-B (384x288)
3DCOCO val2017AP5091.4MogaNet-B (384x288)
3DCOCO val2017AP7584MogaNet-B (384x288)
3DCOCO val2017AR82.2MogaNet-B (384x288)
3DCOCO val2017AP76.4MogaNet-S (384x288)
3DCOCO val2017AP5091MogaNet-S (384x288)
3DCOCO val2017AP7583.3MogaNet-S (384x288)
3DCOCO val2017AR81.4MogaNet-S (384x288)
3DCOCO val2017AP74.9MogaNet-S (256x192)
3DCOCO val2017AR80.1MogaNet-S (256x192)
3DCOCO val2017AP73.2MogaNet-T (256x192)
3DCOCO val2017AP5090.1MogaNet-T (256x192)
3DCOCO val2017AP7581MogaNet-T (256x192)
3DCOCO val2017AR78.8MogaNet-T (256x192)
Instance SegmentationCOCO val2017AP5090.7MogaNet-S (256x192)
Instance SegmentationCOCO val2017AP7582.8MogaNet-S (256x192)
Instance SegmentationCOCO test-devmask AP48.8MogaNet-XL (Cascade Mask R-CNN)
Instance SegmentationCOCO test-devmask AP46.1MogaNet-L (Cascade Mask R-CNN)
Instance SegmentationCOCO test-devmask AP46MogaNet-B (Cascade Mask R-CNN)
Instance SegmentationCOCO test-devmask AP45.1MogaNet-S (Cascade Mask R-CNN)
Instance SegmentationCOCO test-devmask AP44.1MogaNet-L (Mask R-CNN 1x)
Instance SegmentationCOCO test-devmask AP43.2MogaNet-B (Mask R-CNN 1x)
Instance SegmentationCOCO test-devmask AP42.2MogaNet-S (Mask R-CNN 1x)
Instance SegmentationCOCO test-devmask AP39.1MogaNet-T (Mask R-CNN 1x)
Instance SegmentationCOCO test-devmask AP37.6MogaNet-XT
Instance SegmentationCOCO test-devmask AP35.8MogaNet-T
2D ClassificationCOCO 2017 valAP56.2MogaNet-XL (Cascade Mask R-CNN)
2D ClassificationCOCO 2017 valAP53.3MogaNet-L (Cascade Mask R-CNN)
2D ClassificationCOCO 2017 valAP52.6MogaNet-B (Cascade Mask R-CNN)
2D ClassificationCOCO 2017 valAP51.6MogaNet-S (Cascade Mask R-CNN)
2D ClassificationCOCO 2017 valAP49.4MogaNet-L (Mask R-CNN 1x)
2D ClassificationCOCO 2017 valAP48.7MogaNet-L (RetinaNet 1x)
2D ClassificationCOCO 2017 valAP47.9MogaNet-B (Mask R-CNN 1x)
2D ClassificationCOCO 2017 valAP47.7MogaNet-B (RetinaNet 1x)
2D ClassificationCOCO 2017 valAP46.7MogaNet-S (Mask R-CNN 1x)
2D ClassificationCOCO 2017 valAP45.8MogaNet-S (RetinaNet 1x)
2D ClassificationCOCO 2017 valAP42.6MogaNet-T (Mask R-CNN 1x)
2D ClassificationCOCO 2017 valAP41.4MogaNet-T (RetinaNet 1x)
2D ClassificationCOCO 2017 valAP40.7MogaNet-XT (Mask R-CNN 1x)
2D ClassificationCOCO 2017 valAP39.7MogaNet-XT (RetinaNet 1x)
2D Object DetectionCOCO 2017 valAP56.2MogaNet-XL (Cascade Mask R-CNN)
2D Object DetectionCOCO 2017 valAP53.3MogaNet-L (Cascade Mask R-CNN)
2D Object DetectionCOCO 2017 valAP52.6MogaNet-B (Cascade Mask R-CNN)
2D Object DetectionCOCO 2017 valAP51.6MogaNet-S (Cascade Mask R-CNN)
2D Object DetectionCOCO 2017 valAP49.4MogaNet-L (Mask R-CNN 1x)
2D Object DetectionCOCO 2017 valAP48.7MogaNet-L (RetinaNet 1x)
2D Object DetectionCOCO 2017 valAP47.9MogaNet-B (Mask R-CNN 1x)
2D Object DetectionCOCO 2017 valAP47.7MogaNet-B (RetinaNet 1x)
2D Object DetectionCOCO 2017 valAP46.7MogaNet-S (Mask R-CNN 1x)
2D Object DetectionCOCO 2017 valAP45.8MogaNet-S (RetinaNet 1x)
2D Object DetectionCOCO 2017 valAP42.6MogaNet-T (Mask R-CNN 1x)
2D Object DetectionCOCO 2017 valAP41.4MogaNet-T (RetinaNet 1x)
2D Object DetectionCOCO 2017 valAP40.7MogaNet-XT (Mask R-CNN 1x)
2D Object DetectionCOCO 2017 valAP39.7MogaNet-XT (RetinaNet 1x)
10-shot image generationADE20KValidation mIoU54MogaNet-XL (UperNet)
10-shot image generationADE20KGFLOPs (512 x 512)1176MogaNet-L (UperNet)
10-shot image generationADE20KValidation mIoU50.9MogaNet-L (UperNet)
10-shot image generationADE20KGFLOPs (512 x 512)1050MogaNet-B (UperNet)
10-shot image generationADE20KValidation mIoU50.1MogaNet-B (UperNet)
10-shot image generationADE20KGFLOPs (512 x 512)946MogaNet-S (UperNet)
10-shot image generationADE20KValidation mIoU49.2MogaNet-S (UperNet)
10-shot image generationADE20KGFLOPs (512 x 512)189MogaNet-S (Semantic FPN)
10-shot image generationADE20KValidation mIoU47.7MogaNet-S (Semantic FPN)
1 Image, 2*2 StitchiCOCO val2017AP77.3MogaNet-B (384x288)
1 Image, 2*2 StitchiCOCO val2017AP5091.4MogaNet-B (384x288)
1 Image, 2*2 StitchiCOCO val2017AP7584MogaNet-B (384x288)
1 Image, 2*2 StitchiCOCO val2017AR82.2MogaNet-B (384x288)
1 Image, 2*2 StitchiCOCO val2017AP76.4MogaNet-S (384x288)
1 Image, 2*2 StitchiCOCO val2017AP5091MogaNet-S (384x288)
1 Image, 2*2 StitchiCOCO val2017AP7583.3MogaNet-S (384x288)
1 Image, 2*2 StitchiCOCO val2017AR81.4MogaNet-S (384x288)
1 Image, 2*2 StitchiCOCO val2017AP74.9MogaNet-S (256x192)
1 Image, 2*2 StitchiCOCO val2017AR80.1MogaNet-S (256x192)
1 Image, 2*2 StitchiCOCO val2017AP73.2MogaNet-T (256x192)
1 Image, 2*2 StitchiCOCO val2017AP5090.1MogaNet-T (256x192)
1 Image, 2*2 StitchiCOCO val2017AP7581MogaNet-T (256x192)
1 Image, 2*2 StitchiCOCO val2017AR78.8MogaNet-T (256x192)
16kCOCO 2017 valAP56.2MogaNet-XL (Cascade Mask R-CNN)
16kCOCO 2017 valAP53.3MogaNet-L (Cascade Mask R-CNN)
16kCOCO 2017 valAP52.6MogaNet-B (Cascade Mask R-CNN)
16kCOCO 2017 valAP51.6MogaNet-S (Cascade Mask R-CNN)
16kCOCO 2017 valAP49.4MogaNet-L (Mask R-CNN 1x)
16kCOCO 2017 valAP48.7MogaNet-L (RetinaNet 1x)
16kCOCO 2017 valAP47.9MogaNet-B (Mask R-CNN 1x)
16kCOCO 2017 valAP47.7MogaNet-B (RetinaNet 1x)
16kCOCO 2017 valAP46.7MogaNet-S (Mask R-CNN 1x)
16kCOCO 2017 valAP45.8MogaNet-S (RetinaNet 1x)
16kCOCO 2017 valAP42.6MogaNet-T (Mask R-CNN 1x)
16kCOCO 2017 valAP41.4MogaNet-T (RetinaNet 1x)
16kCOCO 2017 valAP40.7MogaNet-XT (Mask R-CNN 1x)
16kCOCO 2017 valAP39.7MogaNet-XT (RetinaNet 1x)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17