TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Active Token Mixer

Active Token Mixer

Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

2022-03-11Image ClassificationSemantic SegmentationInstance SegmentationObject Detection
PaperPDFCodeCode(official)

Abstract

The three existing dominant network families, i.e., CNNs, Transformers, and MLPs, differ from each other mainly in the ways of fusing spatial contextual information, leaving designing more effective token-mixing mechanisms at the core of backbone architecture development. In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextual information distributed across different channels from other tokens into the given query token. This fundamental operator actively predicts where to capture useful contexts and learns how to fuse the captured contexts with the query token at channel level. In this way, the spatial range of token-mixing can be expanded to a global scope with limited computational complexity, where the way of token-mixing is reformed. We take ATM as the primary operator and assemble ATMs into a cascade architecture, dubbed ATMNet. Extensive experiments demonstrate that ATMNet is generally applicable and comprehensively surpasses different families of SOTA vision backbones by a clear margin on a broad range of vision tasks, including visual recognition and dense prediction tasks. Code is available at https://github.com/microsoft/ActiveMLP.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KParams (M)108ActiveMLP-L(UperNet)
Semantic SegmentationADE20KValidation mIoU51.1ActiveMLP-L(UperNet)
Object DetectionCOCO minivalbox AP52.3ActiveMLP-B (Cascade Mask R-CNN)
Image ClassificationImageNetGFLOPs36.4ActiveMLP-L
Image ClassificationImageNetGFLOPs4ActiveMLP-T
3DCOCO minivalbox AP52.3ActiveMLP-B (Cascade Mask R-CNN)
2D ClassificationCOCO minivalbox AP52.3ActiveMLP-B (Cascade Mask R-CNN)
2D Object DetectionCOCO minivalbox AP52.3ActiveMLP-B (Cascade Mask R-CNN)
10-shot image generationADE20KParams (M)108ActiveMLP-L(UperNet)
10-shot image generationADE20KValidation mIoU51.1ActiveMLP-L(UperNet)
16kCOCO minivalbox AP52.3ActiveMLP-B (Cascade Mask R-CNN)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17