Active Token Mixer

Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

2022-03-11Image Classification Semantic Segmentation Instance Segmentation Object Detection

Abstract

The three existing dominant network families, i.e., CNNs, Transformers, and MLPs, differ from each other mainly in the ways of fusing spatial contextual information, leaving designing more effective token-mixing mechanisms at the core of backbone architecture development. In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextual information distributed across different channels from other tokens into the given query token. This fundamental operator actively predicts where to capture useful contexts and learns how to fuse the captured contexts with the query token at channel level. In this way, the spatial range of token-mixing can be expanded to a global scope with limited computational complexity, where the way of token-mixing is reformed. We take ATM as the primary operator and assemble ATMs into a cascade architecture, dubbed ATMNet. Extensive experiments demonstrate that ATMNet is generally applicable and comprehensively surpasses different families of SOTA vision backbones by a clear margin on a broad range of vision tasks, including visual recognition and dense prediction tasks. Code is available at https://github.com/microsoft/ActiveMLP.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ADE20K	Params (M)	108	ActiveMLP-L(UperNet)
Semantic Segmentation	ADE20K	Validation mIoU	51.1	ActiveMLP-L(UperNet)
Object Detection	COCO minival	box AP	52.3	ActiveMLP-B (Cascade Mask R-CNN)
Image Classification	ImageNet	GFLOPs	36.4	ActiveMLP-L
Image Classification	ImageNet	GFLOPs	4	ActiveMLP-T
3D	COCO minival	box AP	52.3	ActiveMLP-B (Cascade Mask R-CNN)
2D Classification	COCO minival	box AP	52.3	ActiveMLP-B (Cascade Mask R-CNN)
2D Object Detection	COCO minival	box AP	52.3	ActiveMLP-B (Cascade Mask R-CNN)
10-shot image generation	ADE20K	Params (M)	108	ActiveMLP-L(UperNet)
10-shot image generation	ADE20K	Validation mIoU	51.1	ActiveMLP-L(UperNet)
16k	COCO minival	box AP	52.3	ActiveMLP-B (Cascade Mask R-CNN)

Active Token Mixer

Abstract

Results

Related Papers

Active Token Mixer

Abstract

Results

Related Papers