Image Classification on ImageNet

Metric: GFLOPs (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	GFLOPs▼	Extra Data	Paper	Date↕	Code
1	InternImage-H	1478	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
2	DaViT-G	1038	No	DaViT: Dual Attention Vision Transformers	2022-04-07	Code
3	SWAG (ViT H/14)	1018.8	No	Revisiting Weakly Supervised Pre-Training of Vis...	2022-01-20	Code
4	MViTv2-H (512 res, ImageNet-21k pretrain)	763.5	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
5	Perceiver (FF)	707.2	No	Perceiver: General Perception with Iterative Att...	2021-03-04	Code
6	MOAT-4 22K+1K	648.5	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
7	DY-MobileNetV2 ×1.0	626	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
8	FixEfficientNet-L2	585	No	Fixing the train-test resolution discrepancy: Fi...	2020-03-18	Code
9	MambaVision-L3	489.1	No	MambaVision: A Hybrid Mamba-Transformer Vision B...	2024-07-10	Code
10	ELSA-VOLO-D5 (512*512)	437	No	ELSA: Enhanced Local Self-Attention for Vision T...	2021-12-23	Code
11	XCiT-L24	417.9	No	XCiT: Cross-Covariance Image Transformers	2021-06-17	Code
12	VOLO-D5+HAT	412	No	Improving Vision Transformers by Revisiting High...	2022-04-03	Code
13	VOLO-D5	412	No	VOLO: Vision Outlooker for Visual Recognition	2021-06-24	Code
14	CaiT-M-48-448	377.3	No	Going deeper with Image Transformers	2021-03-31	Code
15	NFNet-F6 w/ SAM	377.28	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
16	NFNet-F4+	367	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
17	DaViT-H	334	No	DaViT: Dual Attention Vision Transformers	2022-04-07	Code
18	ResNeXt-101 32x48d	306	No	Exploring the Limits of Weakly Supervised Pretra...	2018-05-02	Code
19	NFNet-F5 w/ SAM	289.76	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
20	NFNet-F5	289.76	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
21	MOAT-3 1K only	271	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
22	CAIT-M36-448	247.8	No	Going deeper with Image Transformers	2021-03-31	Code
23	NFNet-F4	215.24	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
24	LV-ViT-L	214.8	No	All Tokens Matter: Token Labeling for Training B...	2021-04-22	Code
25	AmoebaNet-A	208	No	Regularized Evolution for Image Classifier Archi...	2018-02-05	Code
26	VOLO-D4	197	No	VOLO: Vision Outlooker for Visual Recognition	2021-06-24	Code
27	ViT-L	191.2	No	DeiT III: Revenge of the ViT	2022-04-14	Code
28	XCiT-M24	188	No	XCiT: Cross-Covariance Image Transformers	2021-06-17	Code
29	ConvNeXt-XL (ImageNet-22k)	179	No	A ConvNet for the 2020s	2022-01-10	Code
30	ResNeXt-101 32x32d	174	No	Exploring the Limits of Weakly Supervised Pretra...	2018-05-02	Code
31	CAIT-M-36	173.3	No	Going deeper with Image Transformers	2021-03-31	Code
32	InternImage-XL	163	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
33	FasterViT-6	142	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
34	MViTv2-L (384 res, ImageNet-21k pretrain)	140.7	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
35	MViTv2-L (384 res)	140.2	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
36	RepLKNet-XL	128.7	No	Scaling Up Your Kernels to 31x31: Revisiting Lar...	2022-03-13	Code
37	MViTv2-H (mageNet-21k pretrain)	120.6	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
38	CAIT-M-24	116.1	No	Going deeper with Image Transformers	2021-03-31	Code
39	NFNet-F3	114.76	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
40	VAN-B6 (22K, 384res)	114.3	No	Visual Attention Network	2022-02-20	Code
41	CoAtNet-3 @384	114	No	CoAtNet: Marrying Convolution and Attention for ...	2021-06-09	Code
42	FasterViT-5	113	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
43	InternImage-L	108	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
44	XCiT-S24	106	No	XCiT: Cross-Covariance Image Transformers	2021-06-17	Code
45	Swin-L	103.9	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
46	DaViT-L (ImageNet-22k)	103	No	DaViT: Dual Attention Vision Transformers	2022-04-07	Code
47	MogaNet-XL (384res)	102	No	MogaNet: Multi-order Gated Aggregation Network	2022-11-07	Code
48	HorNet-L (GF)	101.8	No	HorNet: Efficient High-Order Spatial Interaction...	2022-07-28	Code
49	DiNAT_s-Large (384res; Pretrained on IN22K@224)	101.5	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
50	ConvNeXt-L (384 res)	101	No	A ConvNet for the 2020s	2022-01-10	Code
51	Mini-Swin-B@384	98.8	No	MiniViT: Compressing Vision Transformers with We...	2022-04-14	Code
52	CSWin-L (384 res,ImageNet-22k pretrain)	96.8	No	CSWin Transformer: A General Vision Transformer ...	2021-07-01	Code
53	EfficientNetV2-XL (21k)	94	Yes	EfficientNetV2: Smaller Models and Faster Training	2021-04-01	Code
54	DiNAT-Large (11x11ks; 384res; Pretrained on IN22K@224)	92.4	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
55	DiNAT-Large (384x384; Pretrained on ImageNet-22K @ 224x224)	89.7	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
56	FixEfficientNet-B7	82	No	Fixing the train-test resolution discrepancy: Fi...	2020-03-18	Code
57	CAFormer-B36 (384 res, 21K)	72.2	No	MetaFormer Baselines for Vision	2022-10-24	Code
58	CAFormer-B36 (384 res)	72.2	No	MetaFormer Baselines for Vision	2022-10-24	Code
59	ResNeXt-101 32×16d	72	No	Exploring the Limits of Weakly Supervised Pretra...	2018-05-02	Code
60	VOLO-D3	67.9	No	VOLO: Vision Outlooker for Visual Recognition	2021-06-24	Code
61	MIRL (ViT-B-48)	67	No	Masked Image Residual Learning for Scaling Deepe...	2023-09-25	Code
62	ConvFormer-B36 (384 res, 21K)	66.5	No	MetaFormer Baselines for Vision	2022-10-24	Code
63	ConvFormer-B36 (384 res)	66.5	No	MetaFormer Baselines for Vision	2022-10-24	Code
64	CAIT-S-48	63.8	No	Going deeper with Image Transformers	2021-03-31	Code
65	NFNet-F2	62.59	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
66	SE-ResNeXt-101, 64x4d, S=2(416px)	61.1	No	Towards Better Accuracy-efficiency Trade-offs: D...	2020-11-30	Code
67	CLCNet (S:ViT+D:VOLO-D3) (retrain)	57.46	No	CLCNet: Rethinking of Ensemble Modeling with Cla...	2022-05-19	Code
68	TransNeXt-Base (IN-1K supervised, 384)	56.3	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
69	XCiT-S12	55.6	No	XCiT: Cross-Covariance Image Transformers	2021-06-17	Code
70	ResNet-RS-270 (256 image res)	54	No	Revisiting ResNets: Improved Training and Scalin...	2021-03-13	Code
71	EfficientNetV2-L (21k)	53	No	EfficientNetV2: Smaller Models and Faster Training	2021-04-01	Code
72	EfficientNetV2-L	53	No	EfficientNetV2: Smaller Models and Faster Training	2021-04-01	Code
73	CLCNet (S:ViT+D:EffNet-B7) (retrain)	51.93	No	CLCNet: Rethinking of Ensemble Modeling with Cla...	2022-05-19	Code
74	UniNet-B6	51	No	UniNet: Unified Architecture Search with Convolu...	2022-07-12	Code
75	Sequencer2D-L↑392	50.7	No	Sequencer: Deep LSTM for Image Classification	2022-05-04	Code
76	VAN-B5 (22K, 384res)	50.6	No	Visual Attention Network	2022-02-20	Code
77	PNASNet-5	50	No	Progressive Neural Architecture Search	2017-12-02	Code
78	DAT-B (384 res, IN-1K only)	49.8	No	Vision Transformer with Deformable Attention	2022-01-03	Code
79	DAT-B++ (384x384)	49.7	No	DAT++: Spatially Dynamic Vision Transformer with...	2023-09-04	Code
80	CAIT-S-36	48	No	Going deeper with Image Transformers	2021-03-31	Code
81	CLCNet (S:D1+D:D5)	47.43	No	CLCNet: Rethinking of Ensemble Modeling with Cla...	2022-05-19	Code
82	Swin-B	47	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
83	Conformer-B	46.6	No	Conformer: Local Features Coupling Global Repres...	2021-05-09	Code
84	DaViT-B (ImageNet-22k)	46.4	No	DaViT: Dual Attention Vision Transformers	2022-04-07	Code
85	CLCNet (S:ConvNeXt-L+D:EffNet-B7) (retrain)	45.43	No	CLCNet: Rethinking of Ensemble Modeling with Cla...	2022-05-19	Code
86	MaxViT-L (224res)	43.9	No	MaxViT: Multi-Axis Vision Transformer	2022-04-04	Code
87	SReT-S (512 res, ImageNet-1K only)	42.8	No	Sliced Recursive Transformer	2021-11-09	Code
88	CAFormer-M36 (384 res, 21K)	42	No	MetaFormer Baselines for Vision	2022-10-24	Code
89	CAFormer-M36 (384 res)	42	No	MetaFormer Baselines for Vision	2022-10-24	Code
90	LITv2-B\|384	39.7	No	Fast Vision Transformers with HiLo Attention	2022-05-26	Code
91	UniFormer-L (384 res)	39.2	No	UniFormer: Unifying Convolution and Self-attenti...	2022-01-24	Code
92	VAN-B6 (22K)	38.9	No	Visual Attention Network	2022-02-20	Code
93	SE-ResNeXt-101, 64x4d, S=2(320px)	38.2	No	Towards Better Accuracy-efficiency Trade-offs: D...	2020-11-30	Code
94	RevBiFPN-S6	38.1	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
95	ConvFormer-M36 (384 res, 21K)	37.7	No	MetaFormer Baselines for Vision	2022-10-24	Code
96	ConvFormer-M36 (384 res)	37.7	No	MetaFormer Baselines for Vision	2022-10-24	Code
97	NoisyStudent (EfficientNet-B7)	37	No	Self-training with Noisy Student improves ImageN...	2019-11-11	Code
98	EfficientNet-B7	37	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
99	FasterViT-4	36.6	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
100	ActiveMLP-L	36.4	No	Active Token Mixer	2022-03-11	Code
101	VAN-B4 (22K, 384res)	35.9	No	Visual Attention Network	2022-02-20	Code
102	NFNet-F1	35.54	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
103	DeiT-B with iRPE-K	35.368	No	Rethinking and Improving Relative Position Encod...	2021-07-29	Code
104	MambaVision-L	34.9	No	MambaVision: A Hybrid Mamba-Transformer Vision B...	2024-07-10	Code
105	RDNet-L (384 res)	34.7	No	DenseNets Reloaded: Paradigm Shift Beyond ResNet...	2024-03-28	Code
106	RDNet-L	34.7	No	DenseNets Reloaded: Paradigm Shift Beyond ResNet...	2024-03-28	Code
107	CoAtNet-3	34.7	No	CoAtNet: Marrying Convolution and Attention for ...	2021-06-09	Code
108	DiNAT_s-Large (224x224; Pretrained on ImageNet-22K @ 224x224)	34.5	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
109	T2T-ViT-14\|384	34.2	No	Tokens-to-Token ViT: Training Vision Transformer...	2021-01-28	Code
110	MViT-B-24	32.7	No	Multiscale Vision Transformers	2021-04-22	Code
111	CAIT-S-24	32.2	No	Going deeper with Image Transformers	2021-03-31	Code
112	TransNeXt-Small (IN-1K supervised, 384)	32.1	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
113	Next-ViT-L @384	32	No	Next-ViT: Next Generation Vision Transformer for...	2022-07-12	Code
114	VVT-L (384 res)	31.8	No	Vicinity Vision Transformer	2022-06-21	Code
115	gMLP-B	31.6	No	Pay Attention to MLPs	2021-05-17	Code
116	ResNeXt-101 64x4	31.5	No	Aggregated Residual Transformations for Deep Neu...	2016-11-16	Code
117	Harm-SE-RNX-101 64x4d (320x320, Mean-Max Pooling)	31.4	No	Harmonic Convolutional Networks based on Discret...	2020-01-18	Code
118	TinySaver(ConvNeXtV2_h, 0.01 Acc drop)	31.17	No	Tiny Models are the Computational Saver for Larg...	2024-03-26	Code
119	T2T-ViTt-24	30	No	Tokens-to-Token ViT: Training Vision Transformer...	2021-01-28	Code
120	ConViT-B+	30	No	ConViT: Improving Vision Transformers with Soft ...	2021-03-19	Code
121	CAIT-XS-36	28.8	No	Going deeper with Image Transformers	2021-03-31	Code
122	ViTAE-B-Stage	27.6	No	ViTAE: Vision Transformer Advanced by Exploring ...	2021-06-07	Code
123	T2T-ViT-24	27.6	No	Tokens-to-Token ViT: Training Vision Transformer...	2021-01-28	Code
124	TinyViT-21M-512-distill (512 res, 21k)	27	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
125	SE-CoTNetD-152	26.5	No	Contextual Transformer Networks for Visual Recog...	2021-07-26	Code
126	CAFormer-S36 (384 res, 21K)	26	No	MetaFormer Baselines for Vision	2022-10-24	Code
127	CAFormer-S36 (384 res)	26	No	MetaFormer Baselines for Vision	2022-10-24	Code
128	CvT-21 (384 res, ImageNet-22k pretrain)	25	No	CvT: Introducing Convolutions to Vision Transfor...	2021-03-29	Code
129	CvT-21 (384 res)	24.9	No	CvT: Introducing Convolutions to Vision Transfor...	2021-03-29	Code
130	ResMLP-B24 + STD	24.1	No	-	-	Code
131	EfficientNetV2-M (21k)	24	No	EfficientNetV2: Smaller Models and Faster Training	2021-04-01	Code
132	NASNET-A(6)	23.8	No	Learning Transferable Architectures for Scalable...	2017-07-21	Code
133	MaxViT-B (224res)	23.4	No	MaxViT: Multi-Axis Vision Transformer	2022-04-04	Code
134	CAFormer-B36 (224 res, 21K)	23.2	No	MetaFormer Baselines for Vision	2022-10-24	Code
135	CAFormer-B36 (224 res)	23.2	No	MetaFormer Baselines for Vision	2022-10-24	Code
136	UniNet-B5	23.2	No	UniNet: Unified Architecture Search with Convolu...	2021-10-08	-
137	MetaFormer PoolFormer-M48	23.2	No	MetaFormer Is Actually What You Need for Vision	2021-11-22	Code
138	ConvFormer-B36 (224 res, 21K)	22.6	No	MetaFormer Baselines for Vision	2022-10-24	Code
139	ConvFormer-B36 (224 res)	22.6	No	MetaFormer Baselines for Vision	2022-10-24	Code
140	ConvFormer-S36 (384 res, 21K)	22.4	No	MetaFormer Baselines for Vision	2022-10-24	Code
141	ConvFormer-S36 (384 res)	22.4	No	MetaFormer Baselines for Vision	2022-10-24	Code
142	Oct-ResNet-152 (SE)	22.2	No	Drop an Octave: Reducing Spatial Redundancy in C...	2019-04-10	Code
143	RevBiFPN-S5	21.8	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
144	UniNet-B5	20.4	No	UniNet: Unified Architecture Search with Convolu...	2022-07-12	Code
145	EfficientViT-L2 (r384)	20	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
146	T2T-ViTt-19	19.6	No	Tokens-to-Token ViT: Training Vision Transformer...	2021-01-28	Code
147	TinySaver(ConvNeXtV2_h, 0.5 Acc drop)	19.41	No	Tiny Models are the Computational Saver for Larg...	2024-03-26	Code
148	CAIT-XS-24	19.3	No	Going deeper with Image Transformers	2021-03-31	Code
149	BoTNet T5	19.3	No	Bottleneck Transformers for Visual Recognition	2021-01-27	Code
150	EfficientNet-B6	19	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
151	MIRL(ViT-S-54)	18.8	No	Masked Image Residual Learning for Scaling Deepe...	2023-09-25	Code
152	ResNeXt-101, 64x4d, S=2(224px)	18.8	No	Towards Better Accuracy-efficiency Trade-offs: D...	2020-11-30	Code
153	CLCNet (S:B4+D:B7)	18.58	No	CLCNet: Rethinking of Ensemble Modeling with Cla...	2022-05-19	Code
154	SReT-S (384 res, ImageNet-1K only)	18.5	No	Sliced Recursive Transformer	2021-11-09	Code
155	RepVGG-B2	18.4	No	RepVGG: Making VGG-style ConvNets Great Again	2021-01-11	Code
156	FasterViT-3	18.2	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
157	Transformer local-attention (NesT-B)	17.9	No	Nested Hierarchical Transformer: Towards Accurat...	2021-05-26	Code
158	RVT-B*	17.7	No	Towards Robust Vision Transformer	2021-05-17	Code
159	VAN-B5 (22K)	17.2	No	Visual Attention Network	2022-02-20	Code
160	KAT-B*	17.06	No	Kolmogorov-Arnold Transformer	2024-09-16	Code
161	ConViT-B	17	No	ConViT: Improving Vision Transformers with Soft ...	2021-03-19	Code
162	GLiT-Bases	17	No	GLiT: Neural Architecture Search for Global and ...	2021-07-07	Code
163	T2T-ViT-19	17	No	Tokens-to-Token ViT: Training Vision Transformer...	2021-01-28	Code
164	DeiT-B	16.87	No	Kolmogorov-Arnold Transformer	2024-09-16	Code
165	ViT-B/16	16.87	No	Kolmogorov-Arnold Transformer	2024-09-16	Code
166	Pyramid ViG-B	16.8	No	Vision GNN: An Image is Worth Graph of Nodes	2022-06-01	Code
167	DAT-B++ (224x224)	16.6	No	DAT++: Spatially Dynamic Vision Transformer with...	2023-09-04	Code
168	Sequencer2D-L	16.6	No	Sequencer: Deep LSTM for Image Classification	2022-05-04	Code
169	MixMIM-B	16.3	No	MixMAE: Mixed and Masked Autoencoder for Efficie...	2022-05-26	Code
170	CvT-13 (384 res)	16.3	No	CvT: Introducing Convolutions to Vision Transfor...	2021-03-29	Code
171	InternImage-B	16	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
172	LV-ViT-M	16	No	All Tokens Matter: Token Labeling for Training B...	2021-04-22	Code
173	MogaNet-L	15.9	No	MogaNet: Multi-order Gated Aggregation Network	2022-11-07	Code
174	Assemble-ResNet152	15.8	No	Compounding the Performance Improvements of Asse...	2020-01-17	Code
175	BossNet-T1	15.8	No	BossNAS: Exploring Hybrid CNN-transformers with ...	2021-03-23	Code
176	CoAtNet-2	15.7	No	CoAtNet: Marrying Convolution and Attention for ...	2021-06-09	Code
177	DaViT-B	15.5	No	DaViT: Dual Attention Vision Transformers	2022-04-07	Code
178	ViT-S @384 (DeiT III)	15.5	No	DeiT III: Revenge of the ViT	2022-04-14	Code
179	RDNet-B	15.4	No	DenseNets Reloaded: Paradigm Shift Beyond ResNet...	2024-03-28	Code
180	DeepMAD-89M	15.4	No	DeepMAD: Mathematical Architecture Design for De...	2023-03-05	Code
181	Shift-B	15.2	No	When Shift Operation Meets Vision Transformer: A...	2022-01-26	Code
182	Twins-SVT-L	15.1	No	Twins: Revisiting the Design of Spatial Attentio...	2021-04-28	Code
183	MambaVision-B	15	No	MambaVision: A Hybrid Mamba-Transformer Vision B...	2024-07-10	Code
184	Wave-ViT-L	14.8	No	Wave-ViT: Unifying Wavelet and Transformers for ...	2022-07-11	Code
185	GC ViT-B	14.8	No	Global Context Vision Transformers	2022-06-20	Code
186	CAIT-XXS-36	14.3	No	Going deeper with Image Transformers	2021-03-31	Code
187	ZenNAS (0.8ms)	13.9	No	Zen-NAS: A Zero-Shot NAS for High-Performance De...	2021-02-01	Code
188	TinyViT-21M-384-distill (384 res, 21k)	13.8	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
189	DiNAT-Base	13.7	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
190	NAT-Base	13.7	No	Neighborhood Attention Transformer	2022-04-14	Code
191	HRFormer-B	13.7	No	HRFormer: High-Resolution Transformer for Dense ...	2021-10-18	Code
192	CAFormer-S18 (384 res, 21K)	13.4	No	MetaFormer Baselines for Vision	2022-10-24	Code
193	CAFormer-S18 (384 res)	13.4	No	MetaFormer Baselines for Vision	2022-10-24	Code
194	ViL-Base-D	13.4	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
195	CAFormer-M36 (224 res, 21K)	13.2	No	MetaFormer Baselines for Vision	2022-10-24	Code
196	CAFormer-M36 (224 res)	13.2	No	MetaFormer Baselines for Vision	2022-10-24	Code
197	LITv2-B	13.2	No	Fast Vision Transformers with HiLo Attention	2022-05-26	Code
198	GTP-DeiT-B/P8	13.1	No	GTP-ViT: Efficient Vision Transformers via Graph...	2023-11-06	Code
199	CeiT-S (384 finetune res)	12.9	No	Incorporating Convolution Designs into Visual Tr...	2021-03-22	Code
200	ConvFormer-M36 (224 res, 21K)	12.8	No	MetaFormer Baselines for Vision	2022-10-24	Code
201	ConvFormer-M36 (224 res)	12.8	No	MetaFormer Baselines for Vision	2022-10-24	Code
202	UniFormer-L	12.6	No	UniFormer: Unifying Convolution and Self-attenti...	2022-01-24	Code
203	PiT-B	12.5	No	Rethinking Spatial Dimensions of Vision Transfor...	2021-03-30	Code
204	NFNet-F0	12.38	No	High-Performance Large-Scale Image Recognition W...	2021-02-11	Code
205	CycleMLP-B5	12.3	No	CycleMLP: A MLP-like Architecture for Dense Pred...	2021-07-21	Code
206	VAN-B4 (22K)	12.2	No	Visual Attention Network	2022-02-20	Code
207	ViTAE-S-Stage	12	No	ViTAE: Vision Transformer Advanced by Exploring ...	2021-06-07	Code
208	PVTv2-B4	11.8	No	PVT v2: Improved Baselines with Pyramid Vision T...	2021-06-25	Code
209	MaxViT-S (224res)	11.7	No	MaxViT: Multi-Axis Vision Transformer	2022-04-04	Code
210	ConvFormer-S18 (384 res, 21K)	11.6	No	MetaFormer Baselines for Vision	2022-10-24	Code
211	ConvFormer-S18 (384 res)	11.6	No	MetaFormer Baselines for Vision	2022-10-24	Code
212	ResNet-152	11.3	No	Deep Residual Learning for Image Recognition	2015-12-10	Code
213	RepVGG-B2g4	11.3	No	RepVGG: Making VGG-style ConvNets Great Again	2021-01-11	Code
214	ScaleNet-152	11.2	No	Data-Driven Neuron Allocation for Scale Aggregat...	2019-04-20	Code
215	Sequencer2D-M	11.1	No	Sequencer: Deep LSTM for Image Classification	2022-05-04	Code
216	CCT-14/7x2	11.06	No	Escaping the Big Data Paradigm with Compact Tran...	2021-04-12	Code
217	EfficientViT-L2 (r288)	11	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
218	AutoFormer-base	11	No	AutoFormer: Searching Transformers for Visual Re...	2021-07-01	Code
219	BoTNet T4	10.9	No	Bottleneck Transformers for Visual Recognition	2021-01-27	Code
220	ECA-Net (ResNet-152)	10.83	No	ECA-Net: Efficient Channel Attention for Deep Co...	2019-10-08	Code
221	VVT-L (224 res)	10.8	No	Vicinity Vision Transformer	2022-06-21	Code
222	RevBiFPN-S4	10.6	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
223	Transformer local-attention (NesT-S)	10.4	No	Nested Hierarchical Transformer: Towards Accurat...	2021-05-26	Code
224	TransNeXt-Small (IN-1K supervised, 224)	10.3	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
225	ConViT-S+	10	No	ConViT: Improving Vision Transformers with Soft ...	2021-03-19	Code
226	MogaNet-B	9.9	No	MogaNet: Multi-order Gated Aggregation Network	2022-11-07	Code
227	UniNet-B4	9.9	No	UniNet: Unified Architecture Search with Convolu...	2021-10-08	-
228	EfficientNet-B5	9.9	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
229	DeiT-S with iRPE-QKV	9.77	No	Rethinking and Improving Relative Position Encod...	2021-07-29	Code
230	QnA-ViT-Base	9.7	No	Learned Queries for Efficient Local Attention	2021-12-21	Code
231	T2T-ViT-14	9.6	No	Tokens-to-Token ViT: Training Vision Transformer...	2021-01-28	Code
232	CAIT-XXS-24	9.6	No	Going deeper with Image Transformers	2021-03-31	Code
233	CrossViT-18+	9.5	No	CrossViT: Cross-Attention Multi-Scale Vision Tra...	2021-03-27	Code
234	DeiT-S with iRPE-QK	9.412	No	Rethinking and Improving Relative Position Encod...	2021-07-29	Code
235	DAT-S++	9.4	No	DAT++: Spatially Dynamic Vision Transformer with...	2023-09-04	Code
236	CentroidViT-S (arXiv, 2021-02)	9.4	No	Centroid Transformers: Learning to Abstract with...	2021-02-17	-
237	DeiT-S with iRPE-K	9.318	No	Rethinking and Improving Relative Position Encod...	2021-07-29	Code
238	SpineNet-143	9.1	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
239	DAT-S	9	No	Vision Transformer with Deformable Attention	2022-01-03	Code
240	CrossViT-18	9	No	CrossViT: Cross-Attention Multi-Scale Vision Tra...	2021-03-27	Code
241	Pyramid ViG-M	8.9	No	Vision GNN: An Image is Worth Graph of Nodes	2022-06-01	Code
242	EfficientNetV2-S (21k)	8.8	No	EfficientNetV2: Smaller Models and Faster Training	2021-04-01	Code
243	FasterViT-2	8.7	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
244	RDNet-S	8.7	No	DenseNets Reloaded: Paradigm Shift Beyond ResNet...	2024-03-28	Code
245	ViL-Medium-D	8.7	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
246	GFNet-H-B	8.6	No	Global Filter Networks for Image Classification	2021-07-01	Code
247	GC ViT-S	8.5	No	Global Context Vision Transformers	2022-06-20	Code
248	SE-CoTNetD-101	8.5	No	Contextual Transformer Networks for Visual Recog...	2021-07-26	Code
249	Shift-S	8.5	No	When Shift Operation Meets Vision Transformer: A...	2022-01-26	Code
250	SKNet-101	8.46	No	Selective Kernel Networks	2019-03-15	Code
251	CoAtNet-1	8.4	No	CoAtNet: Marrying Convolution and Attention for ...	2021-06-09	Code
252	SCARLET-A4	8.4	No	SCARLET-NAS: Bridging the Gap between Stability ...	2019-08-16	Code
253	Sequencer2D-S	8.4	No	Sequencer: Deep LSTM for Image Classification	2022-05-04	Code
254	Next-ViT-B	8.3	No	Next-ViT: Next Generation Vision Transformer for...	2022-07-12	Code
255	Container Container	8.1	No	Container: Context Aggregation Network	2021-06-02	Code
256	CAFormer-S36 (224 res, 21K)	8	No	MetaFormer Baselines for Vision	2022-10-24	Code
257	ELSA-VOLO-D1	8	No	ELSA: Enhanced Local Self-Attention for Vision T...	2021-12-23	Code
258	CAFormer-S36 (224 res)	8	No	MetaFormer Baselines for Vision	2022-10-24	Code
259	InternImage-S	8	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
260	GTP-LV-ViT-M/P8	8	No	GTP-ViT: Efficient Vision Transformers via Graph...	2023-11-06	Code
261	RegNetY-8.0GF	8	No	Designing Network Design Spaces	2020-03-30	Code
262	ResT-Large	7.9	No	ResT: An Efficient Transformer for Visual Recogn...	2021-05-28	Code
263	RandWire-WS	7.9	No	Exploring Randomly Wired Neural Networks for Ima...	2019-04-02	Code
264	SGE-ResNet101	7.858	No	Spatial Group-wise Enhance: Improving Semantic F...	2019-05-23	Code
265	DiNAT-Small	7.8	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
266	NAT-Small	7.8	No	Neighborhood Attention Transformer	2022-04-14	Code
267	IPT-B	7.8	No	IncepFormer: Efficient Inception Transformer wit...	2022-12-06	Code
268	MViT-B-16	7.8	No	Multiscale Vision Transformers	2021-04-22	Code
269	ConvFormer-S36 (224 res, 21K)	7.6	No	MetaFormer Baselines for Vision	2022-10-24	Code
270	ConvFormer-S36 (224 res)	7.6	No	MetaFormer Baselines for Vision	2022-10-24	Code
271	ResNet-101	7.6	No	Deep Residual Learning for Image Recognition	2015-12-10	Code
272	AOGNet-40M-AN	7.51	No	Attentive Normalization	2019-08-04	Code
273	LITv2-M	7.5	No	Fast Vision Transformers with HiLo Attention	2022-05-26	Code
274	MambaVision-S	7.5	No	MambaVision: A Hybrid Mamba-Transformer Vision B...	2024-07-10	Code
275	ScaleNet-101	7.5	No	Data-Driven Neuron Allocation for Scale Aggregat...	2019-04-20	Code
276	ECA-Net (ResNet-101)	7.35	No	ECA-Net: Efficient Channel Attention for Deep Co...	2019-10-08	Code
277	BoTNet T3	7.3	No	Bottleneck Transformers for Visual Recognition	2021-01-27	Code
278	Wave-ViT-B	7.2	No	Wave-ViT: Unifying Wavelet and Transformers for ...	2022-07-11	Code
279	CvT-21	7.1	No	CvT: Introducing Convolutions to Vision Transfor...	2021-03-29	Code
280	HCGNet-C	7.1	No	Gated Convolutional Networks with Hybrid Connect...	2019-08-26	Code
281	gSwin-S	7	No	gSwin: Gated MLP Vision Model with Hierarchical ...	2022-08-24	-
282	PVTv2-B3	6.9	No	PVT v2: Improved Baselines with Pyramid Vision T...	2021-06-25	Code
283	ViTAE-13M	6.8	No	ViTAE: Vision Transformer Advanced by Exploring ...	2021-06-07	Code
284	RedNet-152	6.8	No	Involution: Inverting the Inherence of Convoluti...	2021-03-10	Code
285	ViL-Base-W	6.74	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
286	LV-ViT-S	6.6	No	All Tokens Matter: Token Labeling for Training B...	2021-04-22	Code
287	EfficientViT-B3 (r288)	6.5	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
288	CI2P-ViT	6.442	No	Compress image to patches for Vision Transformer	2025-02-14	Code
289	CrossViT-15+	6.1	No	CrossViT: Cross-Attention Multi-Scale Vision Tra...	2021-03-27	Code
290	ResMLP-S24	6	No	ResMLP: Feedforward networks for image classific...	2021-05-07	Code
291	Next-ViT-S	5.8	No	Next-ViT: Next Generation Vision Transformer for...	2022-07-12	Code
292	Transformer local-attention (NesT-T)	5.8	No	Nested Hierarchical Transformer: Towards Accurat...	2021-05-26	Code
293	CrossViT-15	5.8	No	CrossViT: Cross-Attention Multi-Scale Vision Tra...	2021-03-27	Code
294	TransNeXt-Tiny (IN-1K supervised, 224)	5.7	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
295	MOAT-0 1K only	5.7	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
296	MaxViT-T (224res)	5.6	No	MaxViT: Multi-Axis Vision Transformer	2022-04-04	Code
297	ConViT-S	5.4	No	ConViT: Improving Vision Transformers with Soft ...	2021-03-19	Code
298	ResNeSt-50	5.39	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
299	EfficientViT-L1 (r224)	5.3	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
300	FasterViT-1	5.3	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
301	MambaVision-T2	5.1	No	MambaVision: A Hybrid Mamba-Transformer Vision B...	2024-07-10	Code
302	AutoFormer-small	5.1	No	AutoFormer: Searching Transformers for Visual Re...	2021-07-01	Code
303	MogaNet-S	5	No	MogaNet: Multi-order Gated Aggregation Network	2022-11-07	Code
304	RDNet-T	5	No	DenseNets Reloaded: Paradigm Shift Beyond ResNet...	2024-03-28	Code
305	VAN-B2	5	No	Visual Attention Network	2022-02-20	Code
306	Visformer-S	4.9	No	Visformer: The Vision-friendly Transformer	2021-04-26	Code
307	ViL-Small	4.86	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
308	ELSA-Swin-T	4.8	No	ELSA: Enhanced Local Self-Attention for Vision T...	2021-12-23	Code
309	GTP-LV-ViT-S/P8	4.8	No	GTP-ViT: Efficient Vision Transformers via Graph...	2023-11-06	Code
310	LocalViT-PVT	4.8	No	LocalViT: Bringing Locality to Vision Transformers	2021-04-12	Code
311	Wave-ViT-S	4.7	No	Wave-ViT: Unifying Wavelet and Transformers for ...	2022-07-11	Code
312	GC ViT-T	4.7	No	Global Context Vision Transformers	2022-06-20	Code
313	IPT-S	4.7	No	IncepFormer: Efficient Inception Transformer wit...	2022-12-06	Code
314	MViTv2-T	4.7	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
315	RVT-S*	4.7	No	Towards Robust Vision Transformer	2021-05-17	Code
316	RedNet-101	4.7	No	Involution: Inverting the Inherence of Convoluti...	2021-03-10	Code
317	ResNet-RS-50 (160 image res)	4.6	No	Revisiting ResNets: Improved Training and Scalin...	2021-03-13	Code
318	Pyramid ViG-S	4.6	No	Vision GNN: An Image is Worth Graph of Nodes	2022-06-01	Code
319	DAT-T	4.6	No	Vision Transformer with Deformable Attention	2022-01-03	Code
320	LocalViT-S	4.6	No	LocalViT: Bringing Locality to Vision Transformers	2021-04-12	Code
321	ViTAE-T-Stage	4.6	No	ViTAE: Vision Transformer Advanced by Exploring ...	2021-06-07	Code
322	ConvNeXt-T	4.5	No	A ConvNet for the 2020s	2022-01-10	Code
323	CeiT-S	4.5	No	Incorporating Convolution Designs into Visual Tr...	2021-03-22	Code
324	CvT-13	4.5	No	CvT: Introducing Convolutions to Vision Transfor...	2021-03-29	Code
325	Swin-T	4.5	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
326	QnA-ViT-Small	4.4	No	Learned Queries for Efficient Local Attention	2021-12-21	Code
327	MambaVision-T	4.4	No	MambaVision: A Hybrid Mamba-Transformer Vision B...	2024-07-10	Code
328	Shift-T	4.4	No	When Shift Operation Meets Vision Transformer: A...	2022-01-26	Code
329	GLiT-Smalls	4.4	No	GLiT: Neural Architecture Search for Global and ...	2021-07-07	Code
330	ResNeSt-50-fast	4.34	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
331	TinyViT-21M-distill (21k)	4.3	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
332	DAT-T++	4.3	No	DAT++: Spatially Dynamic Vision Transformer with...	2023-09-04	Code
333	NAT-Tiny	4.3	No	Neighborhood Attention Transformer	2022-04-14	Code
334	TinyViT-21M	4.3	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
335	DiNAT-Tiny	4.3	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
336	Mixer-S16 + STD	4.3	No	-	-	Code
337	EfficientNet-B4	4.2	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
338	CoAtNet-0	4.2	No	CoAtNet: Marrying Convolution and Attention for ...	2021-06-09	Code
339	SGE-ResNet50	4.127	No	Spatial Group-wise Enhance: Improving Semantic F...	2019-05-23	Code
340	CAFormer-S18 (224 res, 21K)	4.1	No	MetaFormer Baselines for Vision	2022-10-24	Code
341	CAFormer-S18 (224 res)	4.1	No	MetaFormer Baselines for Vision	2022-10-24	Code
342	CvT-13-NAS	4.1	No	CvT: Introducing Convolutions to Vision Transfor...	2021-03-29	Code
343	SE-CoTNetD-50	4.1	No	Contextual Transformer Networks for Visual Recog...	2021-07-26	Code
344	EfficientViT-B3 (r224)	4	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
345	CycleMLP-B2 + STD	4	No	-	-	Code
346	PVTv2-B2	4	No	PVT v2: Improved Baselines with Pyramid Vision T...	2021-06-25	Code
347	ActiveMLP-T	4	No	Active Token Mixer	2022-03-11	Code
348	RegNetY-4.0GF	4	No	Designing Network Design Spaces	2020-03-30	Code
349	ViTAE-6M	4	No	ViTAE: Vision Transformer Advanced by Exploring ...	2021-06-07	Code
350	ConvFormer-S18 (224 res, 21K)	3.9	No	MetaFormer Baselines for Vision	2022-10-24	Code
351	ConvFormer-S18 (224 res)	3.9	No	MetaFormer Baselines for Vision	2022-10-24	Code
352	ECA-Net (ResNet-50)	3.86	No	ECA-Net: Efficient Channel Attention for Deep Co...	2019-10-08	Code
353	ScaleNet-50	3.8	No	Data-Driven Neuron Allocation for Scale Aggregat...	2019-04-20	Code
354	ResNet-50	3.8	No	Deep Residual Learning for Image Recognition	2015-12-10	Code
355	LITv2-S	3.7	No	Fast Vision Transformers with HiLo Attention	2022-05-26	Code
356	DY-ResNet-18	3.7	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
357	UniFormer-S	3.6	No	UniFormer: Unifying Convolution and Self-attenti...	2022-01-24	Code
358	gSwin-T	3.6	No	gSwin: Gated MLP Vision Model with Hierarchical ...	2022-08-24	-
359	CeiT-T (384 finetune res)	3.6	No	Incorporating Convolution Designs into Visual Tr...	2021-03-22	Code
360	CAS-ViT-T	3.597	No	CAS-ViT: Convolutional Additive Self-attention V...	2024-08-07	Code
361	EdgeFormer-S	3.48	No	ParC-Net: Position Aware Circular Convolution wi...	2022-03-08	Code
362	ReXNet_3.0	3.4	No	Rethinking Channel Dimensions for Efficient Mode...	2020-07-02	Code
363	GTP-DeiT-S/P8	3.4	No	GTP-ViT: Efficient Vision Transformers via Graph...	2023-11-06	Code
364	RevBiFPN-S3	3.33	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
365	FasterViT-0	3.3	No	FasterViT: Fast Vision Transformers with Hierarc...	2023-06-09	Code
366	Container-Light	3.2	No	Container: Context Aggregation Network	2021-06-02	Code
367	ResMLP-12 (distilled, class-MLP)	3	No	ResMLP: Feedforward networks for image classific...	2021-05-07	Code
368	ViTAE-T	3	No	ViTAE: Vision Transformer Advanced by Exploring ...	2021-06-07	Code
369	MobileOne-S4	2.978	No	MobileOne: An Improved One millisecond Mobile Ba...	2022-06-08	Code
370	PiT-S	2.9	No	Rethinking Spatial Dimensions of Vision Transfor...	2021-03-30	Code
371	MobileOne-S4 (distill)	2.9	No	MobileOne: An Improved One millisecond Mobile Ba...	2022-06-08	Code
372	TransNeXt-Micro (IN-1K supervised, 224)	2.7	No	TransNeXt: Robust Foveal Visual Perception for V...	2023-11-28	Code
373	NAT-Mini	2.7	No	Neighborhood Attention Transformer	2022-04-14	Code
374	DiNAT-Mini	2.7	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
375	RedNet-50	2.7	No	Involution: Inverting the Inherence of Convoluti...	2021-03-10	Code
376	GC ViT-XT	2.6	No	Global Context Vision Transformers	2022-06-20	Code
377	EdgeNeXt-S	2.6	No	EdgeNeXt: Efficiently Amalgamated CNN-Transforme...	2022-06-21	Code
378	LR-Net-26	2.6	No	Local Relation Networks for Image Recognition	2019-04-25	Code
379	DeiT-Ti with iRPE-K	2.568	No	Rethinking and Improving Relative Position Encod...	2021-07-29	Code
380	QnA-ViT-Tiny	2.5	No	Learned Queries for Efficient Local Attention	2021-12-21	Code
381	VAN-B1	2.5	No	Visual Attention Network	2022-02-20	Code
382	UniNet-B2	2.4	No	UniNet: Unified Architecture Search with Convolu...	2021-10-08	-
383	HVT-S-1	2.4	No	Scalable Vision Transformers with Hierarchical P...	2021-03-19	Code
384	LeViT-384	2.334	No	LeViT: a Vision Transformer in ConvNet's Clothin...	2021-04-02	Code
385	IPT-T	2.3	No	IncepFormer: Efficient Inception Transformer wit...	2022-12-06	Code
386	gSwin-VT	2.3	No	gSwin: Gated MLP Vision Model with Hierarchical ...	2022-08-24	-
387	RedNet-38	2.2	No	Involution: Inverting the Inherence of Convoluti...	2021-03-10	Code
388	Ghost-ResNet-50 (s=2)	2.2	No	GhostNet: More Features from Cheap Operations	2019-11-27	Code
389	FBNetV5-F-CLS	2.1	No	FBNetV5: Neural Architecture Search for Multiple...	2021-11-19	-
390	EfficientViT-B2 (r256)	2.1	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
391	GC ViT-XXT	2.1	No	Global Context Vision Transformers	2022-06-20	Code
392	PVTv2-B1	2.1	No	PVT v2: Improved Baselines with Pyramid Vision T...	2021-06-25	Code
393	TinyViT-11M-distill (21k)	2	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
394	CloFormer-S	2	No	Rethinking Local Perception in Lightweight Visio...	2023-03-31	Code
395	TinyViT-11M	2	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
396	HCGNet-B	2	No	Gated Convolutional Networks with Hybrid Connect...	2019-08-26	Code
397	ConViT-Ti+	2	No	ConViT: Improving Vision Transformers with Soft ...	2021-03-19	Code
398	ResT-Small	1.9	No	ResT: An Efficient Transformer for Visual Recogn...	2021-05-28	Code
399	MobileOne-S3	1.896	No	MobileOne: An Improved One millisecond Mobile Ba...	2022-06-08	Code
400	CAS-ViT-M	1.887	No	CAS-ViT: Convolutional Additive Self-attention V...	2024-08-07	Code
401	NASViT (supernet)	1.881	No	-	-	Code
402	MobileViTv3-1.0	1.876	No	MobileViTv3: Mobile-Friendly Vision Transformer ...	2022-09-30	Code
403	MobileViTv3-S	1.841	No	MobileViTv3: Mobile-Friendly Vision Transformer ...	2022-09-30	Code
404	DY-ResNet-10	1.82	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
405	HRFormer-T	1.8	No	HRFormer: High-Resolution Transformer for Dense ...	2021-10-18	Code
406	MobileViTv2-1.0	1.8	No	Separable Self-attention for Mobile Vision Trans...	2022-06-06	Code
407	DVT (T2T-ViT-12)	1.7	No	Not All Images are Worth 16x16 Words: Dynamic Tr...	2021-05-31	Code
408	Pyramid ViG-Ti	1.7	No	Vision GNN: An Image is Worth Graph of Nodes	2022-06-01	Code
409	RedNet-26	1.7	No	Involution: Inverting the Inherence of Convoluti...	2021-03-10	Code
410	FixEfficientNet-B0	1.6	No	Fixing the train-test resolution discrepancy: Fi...	2020-03-18	Code
411	RegNetY-1.6GF	1.6	No	Designing Network Design Spaces	2020-03-30	Code
412	ReXNet_2.0	1.5	No	Rethinking Channel Dimensions for Efficient Mode...	2020-07-02	Code
413	MogaNet-T (256res)	1.44	No	MogaNet: Multi-order Gated Aggregation Network	2022-11-07	Code
414	PiT-XS	1.4	No	Rethinking Spatial Dimensions of Vision Transfor...	2021-03-30	Code
415	GLiT-Tinys	1.4	No	GLiT: Neural Architecture Search for Global and ...	2021-07-07	Code
416	LocalViT-TNT	1.4	No	LocalViT: Bringing Locality to Vision Transformers	2021-04-12	Code
417	RevBiFPN-S2	1.37	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
418	TinyViT-5M-distill (21k)	1.3	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
419	RVT-Ti*	1.3	No	Towards Robust Vision Transformer	2021-05-17	Code
420	TinyViT-5M	1.3	No	TinyViT: Fast Pretraining Distillation for Small...	2022-07-21	Code
421	Visformer-Ti	1.3	No	Visformer: The Vision-friendly Transformer	2021-04-26	Code
422	ViL-Tiny-RPB	1.3	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
423	LocalViT-T	1.3	No	LocalViT: Bringing Locality to Vision Transformers	2021-04-12	Code
424	AutoFormer-tiny	1.3	No	AutoFormer: Searching Transformers for Visual Re...	2021-07-01	Code
425	MobileOne-S2	1.299	No	MobileOne: An Improved One millisecond Mobile Ba...	2022-06-08	Code
426	SReT-LT (Fast Knowledge Distillation)	1.2	No	A Fast Knowledge Distillation Framework for Visu...	2021-12-02	Code
427	CeiT-T	1.2	No	Incorporating Convolution Designs into Visual Tr...	2021-03-22	Code
428	Ghost-ResNet-50 (s=4)	1.2	No	GhostNet: More Features from Cheap Operations	2019-11-27	Code
429	LocalViT-T2T	1.2	No	LocalViT: Bringing Locality to Vision Transformers	2021-04-12	Code
430	MobileNet-224 (CGD)	1.198	No	Compact Global Descriptor for Neural Networks	2019-07-23	Code
431	MobileNetV2 (1.4)	1.17	No	MobileNetV2: Inverted Residuals and Linear Bottl...	2018-01-13	Code
432	MobileNet-224 ×1.25	1.138	No	MobileNets: Efficient Convolutional Neural Netwo...	2017-04-17	Code
433	CloFormer-XS	1.1	No	Rethinking Local Perception in Lightweight Visio...	2023-03-31	Code
434	SReT-T	1.1	No	Sliced Recursive Transformer	2021-11-09	Code
435	LeViT-256	1.066	No	LeViT: a Vision Transformer in ConvNet's Clothin...	2021-04-02	Code
436	MobileViTv3-0.75	1.064	No	MobileViTv3: Mobile-Friendly Vision Transformer ...	2022-09-30	Code
437	MogaNet-XT (256res)	1.04	No	MogaNet: Multi-order Gated Aggregation Network	2022-11-07	Code
438	FBNetV5-C-CLS	1	No	FBNetV5: Neural Architecture Search for Multiple...	2021-11-19	-
439	EfficientNet-B2	1	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
440	MobileViTv2-0.75	1	No	Separable Self-attention for Mobile Vision Trans...	2022-06-06	Code
441	ConViT-Ti	1	No	ConViT: Improving Vision Transformers with Soft ...	2021-03-19	Code
442	UniNet-B1	0.99	No	UniNet: Unified Architecture Search with Convolu...	2021-10-08	-
443	CAS-ViT-S	0.932	No	CAS-ViT: Convolutional Additive Self-attention V...	2024-08-07	Code
444	MobileViTv3-XS	0.927	No	MobileViTv3: Mobile-Friendly Vision Transformer ...	2022-09-30	Code
445	VAN-B0	0.9	No	Visual Attention Network	2022-02-20	Code
446	ReXNet_1.5	0.86	No	Rethinking Channel Dimensions for Efficient Mode...	2020-07-02	Code
447	EfficientNet-B0 (CondConv)	0.826	No	CondConv: Conditionally Parameterized Convolutio...	2019-04-10	Code
448	MobileOne-S1	0.825	No	MobileOne: An Improved One millisecond Mobile Ba...	2022-06-08	Code
449	ZenNet-400M-SE	0.82	No	Zen-NAS: A Zero-Shot NAS for High-Performance De...	2021-02-01	Code
450	MnasNet-A3	0.806	No	MnasNet: Platform-Aware Neural Architecture Sear...	2018-07-31	Code
451	RegNetY-800MF	0.8	No	Designing Network Design Spaces	2020-03-30	Code
452	FairNAS-A	0.776	No	FairNAS: Rethinking Evaluation Fairness of Weigh...	2019-07-03	Code
453	NASViT-A5	0.757	No	-	-	Code
454	SCARLET-A	0.73	No	SCARLET-NAS: Bridging the Gap between Stability ...	2019-08-16	Code
455	FBNetV5	0.726	No	FBNetV5: Neural Architecture Search for Multiple...	2021-11-19	-
456	AlphaNet-A6	0.709	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
457	DVT (T2T-ViT-10)	0.7	No	Not All Images are Worth 16x16 Words: Dynamic Tr...	2021-05-31	Code
458	EfficientNet-B1	0.7	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
459	MobileViT-XS	0.7	No	MobileViT: Light-weight, General-purpose, and Mo...	2021-10-05	Code
460	PiT-Ti	0.7	No	Rethinking Spatial Dimensions of Vision Transfor...	2021-03-30	Code
461	SReT-ExT	0.7	No	Sliced Recursive Transformer	2021-11-09	Code
462	FairNAS-B	0.69	No	FairNAS: Rethinking Evaluation Fairness of Weigh...	2019-07-03	Code
463	FBNetV5-A-CLS	0.685	No	FBNetV5: Neural Architecture Search for Multiple...	2021-11-19	-
464	MnasNet-A2	0.68	No	MnasNet: Platform-Aware Neural Architecture Sear...	2018-07-31	Code
465	ReXNet_1.3	0.66	No	Rethinking Channel Dimensions for Efficient Mode...	2020-07-02	Code
466	SCARLET-B	0.658	No	SCARLET-NAS: Bridging the Gap between Stability ...	2019-08-16	Code
467	FairNAS-C	0.642	No	FairNAS: Rethinking Evaluation Fairness of Weigh...	2019-07-03	Code
468	HVT-Ti-1	0.64	No	Scalable Vision Transformers with Hierarchical P...	2021-03-19	Code
469	MUXNet-l	0.636	No	MUXConv: Information Multiplexing in Convolution...	2020-03-31	Code
470	LeViT-192	0.624	No	LeViT: a Vision Transformer in ConvNet's Clothin...	2021-04-02	Code
471	MnasNet-A1	0.624	No	MnasNet: Platform-Aware Neural Architecture Sear...	2018-07-31	Code
472	RevBiFPN-S1	0.62	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
473	MoGA-A	0.608	No	MoGA: Searching Beyond MobileNetV3	2019-08-04	Code
474	ESPNetv2	0.602	No	ESPNetv2: A Light-weight, Power Efficient, and G...	2018-11-28	Code
475	DVT (T2T-ViT-7)	0.6	No	Not All Images are Worth 16x16 Words: Dynamic Tr...	2021-05-31	Code
476	CloFormer-XXS	0.6	No	Rethinking Local Perception in Lightweight Visio...	2023-03-31	Code
477	RegNetY-600MF	0.6	No	Designing Network Design Spaces	2020-03-30	Code
478	MobileNetV2	0.6	Yes	MobileNetV2: Inverted Residuals and Linear Bottl...	2018-01-13	Code
479	PVTv2-B0	0.6	No	PVT v2: Improved Baselines with Pyramid Vision T...	2021-06-25	Code
480	ShuffleNet V2	0.597	No	ShuffleNet V2: Practical Guidelines for Efficien...	2018-07-30	Code
481	NASViT-A4	0.591	No	-	-	Code
482	TinyNet (GhostNet-A)	0.591	No	Model Rubik's Cube: Twisting Resolution, Depth a...	2020-10-28	Code
483	RandWire-WS (small)	0.583	No	Exploring Randomly Wired Neural Networks for Ima...	2019-04-02	Code
484	MixNet-L	0.565	No	MixConv: Mixed Depthwise Convolutional Kernels	2019-07-22	Code
485	UniNet-B0	0.56	No	UniNet: Unified Architecture Search with Convolu...	2021-10-08	-
486	CAS-ViT-XS	0.56	No	CAS-ViT: Convolutional Additive Self-attention V...	2024-08-07	Code
487	SCARLET-C	0.56	No	SCARLET-NAS: Bridging the Gap between Stability ...	2019-08-16	Code
488	UniNet-B0	0.555	No	UniNet: Unified Architecture Search with Convolu...	2022-07-12	Code
489	DiCENet	0.553	No	DiCENet: Dimension-wise Convolutions for Efficie...	2019-06-08	Code
490	NASViT-A3	0.528	No	-	-	Code
491	EdgeNeXt-XXS	0.522	No	EdgeNeXt: Efficiently Amalgamated CNN-Transforme...	2022-06-21	Code
492	MobileViTv2-0.5	0.5	No	Separable Self-attention for Mobile Vision Trans...	2022-06-06	Code
493	AlphaNet-A5	0.491	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
494	MobileViTv3-0.5	0.481	No	MobileViTv3: Mobile-Friendly Vision Transformer ...	2022-09-30	Code
495	AlphaNet-A4	0.444	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
496	MobileNet V3-Large 1.0	0.438	No	Searching for MobileNetV3	2019-05-06	Code
497	MUXNet-m	0.436	No	MUXConv: Information Multiplexing in Convolution...	2020-03-31	Code
498	DY-MobileNetV2 ×0.75	0.435	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
499	AsymmNet-Large ×1.0	0.4338	No	AsymmNet: Towards ultralight convolution neural ...	2021-04-15	Code
500	NASViT-A2	0.421	No	-	-	Code
501	ReXNet_1.0	0.4	No	Rethinking Channel Dimensions for Efficient Mode...	2020-07-02	Code
502	RegNetY-400MF	0.4	No	Designing Network Design Spaces	2020-03-30	Code
503	DGPPF-ResNet50	0.4	No	-	-	Code
504	EfficientNet-B0	0.39	No	EfficientNet: Rethinking Model Scaling for Convo...	2019-05-28	Code
505	LeViT-128	0.376	No	LeViT: a Vision Transformer in ConvNet's Clothin...	2021-04-02	Code
506	FBNet-C	0.375	No	FBNet: Hardware-Aware Efficient ConvNet Design v...	2018-12-09	Code
507	GreedyNAS-A	0.366	No	GreedyNAS: Towards Fast One-Shot NAS with Greedy...	2020-03-25	-
508	SkipblockNet-L	0.364	No	Bias Loss for Mobile Neural Networks	2021-07-23	Code
509	MixNet-M	0.36	No	MixConv: Mixed Depthwise Convolutional Kernels	2019-07-22	Code
510	AlphaNet-A3	0.357	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
511	ReXNet_0.9	0.35	No	Rethinking Channel Dimensions for Efficient Mode...	2020-07-02	Code
512	TinyNet-A + RA	0.339	No	Model Rubik's Cube: Twisting Resolution, Depth a...	2020-10-28	Code
513	GreedyNAS-B	0.324	No	GreedyNAS: Towards Fast One-Shot NAS with Greedy...	2020-03-25	-
514	ECA-Net (MobileNetV2)	0.32	No	ECA-Net: Efficient Channel Attention for Deep Co...	2019-10-08	Code
515	AlphaNet-A2	0.317	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
516	RevBiFPN-S0	0.31	No	RevBiFPN: The Fully Reversible Bidirectional Fea...	2022-06-28	Code
517	NASViT-A1	0.309	No	-	-	Code
518	MobileViTv3-XXS	0.289	No	MobileViTv3: Mobile-Friendly Vision Transformer ...	2022-09-30	Code
519	LeViT-128S	0.288	No	LeViT: a Vision Transformer in ConvNet's Clothin...	2021-04-02	Code
520	GreedyNAS-C	0.284	No	GreedyNAS: Towards Fast One-Shot NAS with Greedy...	2020-03-25	-
521	FBNetV5-AC-CLS	0.28	No	FBNetV5: Neural Architecture Search for Multiple...	2021-11-19	-
522	AlphaNet-A1	0.279	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
523	MobileOne-S0 (distill)	0.275	No	MobileOne: An Improved One millisecond Mobile Ba...	2022-06-08	Code
524	MixNet-S	0.256	No	MixConv: Mixed Depthwise Convolutional Kernels	2019-07-22	Code
525	SkipblockNet-M	0.246	No	Bias Loss for Mobile Neural Networks	2021-07-23	Code
526	MUXNet-s	0.234	No	MUXConv: Information Multiplexing in Convolution...	2020-03-31	Code
527	GhostNet ×1.3	0.226	No	GhostNet: More Features from Cheap Operations	2019-11-27	Code
528	FBNetV5-AR-CLS	0.215	No	FBNetV5: Neural Architecture Search for Multiple...	2021-11-19	-
529	CoE-Large + CondConv	0.214	No	Collaboration of Experts: Achieving 80% Top-1 Ac...	2021-07-08	-
530	NASViT-A0	0.208	No	-	-	Code
531	AlphaNet-A0	0.203	No	AlphaNet: Improved Training of Supernets with Al...	2021-02-16	Code
532	DY-MobileNetV2 ×0.5	0.203	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
533	DGPPF-ResNet18	0.2	No	-	-	Code
534	BasisNet-MV3	0.198	No	BasisNet: Two-stage Model Synthesis for Efficien...	2021-05-07	-
535	CoE-Large	0.194	No	Collaboration of Experts: Achieving 80% Top-1 Ac...	2021-07-08	-
536	GhostNet ×1.0	0.141	No	GhostNet: More Features from Cheap Operations	2019-11-27	Code
537	DY-MobileNetV3-Small	0.137	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
538	AsymmNet-Large ×0.5	0.1344	No	AsymmNet: Towards ultralight convolution neural ...	2021-04-15	Code
539	MUXNet-xs	0.132	No	MUXConv: Information Multiplexing in Convolution...	2020-03-31	Code
540	DY-MobileNetV2 ×0.35	0.124	No	Dynamic Convolution: Attention over Convolution ...	2019-12-07	Code
541	AsymmNet-Small ×1.0	0.1154	No	AsymmNet: Towards ultralight convolution neural ...	2021-04-15	Code
542	CoE-Small + CondConv + PWLU	0.1	No	Collaboration of Experts: Achieving 80% Top-1 Ac...	2021-07-08	-
543	DGPPF-MobileNetV2	0.1	No	-	-	Code
544	GhostNet ×0.5	0.042	No	GhostNet: More Features from Cheap Operations	2019-11-27	Code

#1InternImage-HSOTA
1478
GFLOPs· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#2DaViT-GSOTA
1038
GFLOPs· 2022-04-07
DaViT: Dual Attention Vision Transformers Code
#3SWAG (ViT H/14)SOTA
1018.8
GFLOPs· 2022-01-20
Revisiting Weakly Supervised Pre-Training of Visual Perception Models Code
#4MViTv2-H (512 res, ImageNet-21k pretrain)SOTA
763.5
GFLOPs· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#5Perceiver (FF)SOTA
707.2
GFLOPs· 2021-03-04
Perceiver: General Perception with Iterative Attention Code
#6MOAT-4 22K+1K
648.5
GFLOPs· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#7DY-MobileNetV2 ×1.0SOTA
626
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#8FixEfficientNet-L2
585
GFLOPs· 2020-03-18
Fixing the train-test resolution discrepancy: FixEfficientNet Code
#9MambaVision-L3
489.1
GFLOPs· 2024-07-10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Code
#10ELSA-VOLO-D5 (512*512)
437
GFLOPs· 2021-12-23
ELSA: Enhanced Local Self-Attention for Vision Transformer Code
#11XCiT-L24
417.9
GFLOPs· 2021-06-17
XCiT: Cross-Covariance Image Transformers Code
#12VOLO-D5+HAT
412
GFLOPs· 2022-04-03
Improving Vision Transformers by Revisiting High-frequency Components Code
#13VOLO-D5
412
GFLOPs· 2021-06-24
VOLO: Vision Outlooker for Visual Recognition Code
#14CaiT-M-48-448
377.3
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#15NFNet-F6 w/ SAM
377.28
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#16NFNet-F4+
367
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#17DaViT-H
334
GFLOPs· 2022-04-07
DaViT: Dual Attention Vision Transformers Code
#18ResNeXt-101 32x48dSOTA
306
GFLOPs· 2018-05-02
Exploring the Limits of Weakly Supervised Pretraining Code
#19NFNet-F5 w/ SAM
289.76
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#20NFNet-F5
289.76
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#21MOAT-3 1K only
271
GFLOPs· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#22CAIT-M36-448
247.8
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#23NFNet-F4
215.24
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#24LV-ViT-L
214.8
GFLOPs· 2021-04-22
All Tokens Matter: Token Labeling for Training Better Vision Transformers Code
#25AmoebaNet-ASOTA
208
GFLOPs· 2018-02-05
Regularized Evolution for Image Classifier Architecture Search Code
#26VOLO-D4
197
GFLOPs· 2021-06-24
VOLO: Vision Outlooker for Visual Recognition Code
#27ViT-L
191.2
GFLOPs· 2022-04-14
DeiT III: Revenge of the ViT Code
#28XCiT-M24
188
GFLOPs· 2021-06-17
XCiT: Cross-Covariance Image Transformers Code
#29ConvNeXt-XL (ImageNet-22k)
179
GFLOPs· 2022-01-10
A ConvNet for the 2020s Code
#30ResNeXt-101 32x32d
174
GFLOPs· 2018-05-02
Exploring the Limits of Weakly Supervised Pretraining Code
#31CAIT-M-36
173.3
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#32InternImage-XL
163
GFLOPs· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#33FasterViT-6
142
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#34MViTv2-L (384 res, ImageNet-21k pretrain)
140.7
GFLOPs· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#35MViTv2-L (384 res)
140.2
GFLOPs· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#36RepLKNet-XL
128.7
GFLOPs· 2022-03-13
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs Code
#37MViTv2-H (mageNet-21k pretrain)
120.6
GFLOPs· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#38CAIT-M-24
116.1
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#39NFNet-F3
114.76
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#40VAN-B6 (22K, 384res)
114.3
GFLOPs· 2022-02-20
Visual Attention Network Code
#41CoAtNet-3 @384
114
GFLOPs· 2021-06-09
CoAtNet: Marrying Convolution and Attention for All Data Sizes Code
#42FasterViT-5
113
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#43InternImage-L
108
GFLOPs· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#44XCiT-S24
106
GFLOPs· 2021-06-17
XCiT: Cross-Covariance Image Transformers Code
#45Swin-L
103.9
GFLOPs· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#46DaViT-L (ImageNet-22k)
103
GFLOPs· 2022-04-07
DaViT: Dual Attention Vision Transformers Code
#47MogaNet-XL (384res)
102
GFLOPs· 2022-11-07
MogaNet: Multi-order Gated Aggregation Network Code
#48HorNet-L (GF)
101.8
GFLOPs· 2022-07-28
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions Code
#49DiNAT_s-Large (384res; Pretrained on IN22K@224)
101.5
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#50ConvNeXt-L (384 res)
101
GFLOPs· 2022-01-10
A ConvNet for the 2020s Code
#51Mini-Swin-B@384
98.8
GFLOPs· 2022-04-14
MiniViT: Compressing Vision Transformers with Weight Multiplexing Code
#52CSWin-L (384 res,ImageNet-22k pretrain)
96.8
GFLOPs· 2021-07-01
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows Code
#53EfficientNetV2-XL (21k)
94
GFLOPs· Extra Data· 2021-04-01
EfficientNetV2: Smaller Models and Faster Training Code
#54DiNAT-Large (11x11ks; 384res; Pretrained on IN22K@224)
92.4
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#55DiNAT-Large (384x384; Pretrained on ImageNet-22K @ 224x224)
89.7
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#56FixEfficientNet-B7
82
GFLOPs· 2020-03-18
Fixing the train-test resolution discrepancy: FixEfficientNet Code
#57CAFormer-B36 (384 res, 21K)
72.2
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#58CAFormer-B36 (384 res)
72.2
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#59ResNeXt-101 32×16d
72
GFLOPs· 2018-05-02
Exploring the Limits of Weakly Supervised Pretraining Code
#60VOLO-D3
67.9
GFLOPs· 2021-06-24
VOLO: Vision Outlooker for Visual Recognition Code
#61MIRL (ViT-B-48)
67
GFLOPs· 2023-09-25
Masked Image Residual Learning for Scaling Deeper Vision Transformers Code
#62ConvFormer-B36 (384 res, 21K)
66.5
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#63ConvFormer-B36 (384 res)
66.5
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#64CAIT-S-48
63.8
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#65NFNet-F2
62.59
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#66SE-ResNeXt-101, 64x4d, S=2(416px)
61.1
GFLOPs· 2020-11-30
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training Code
#67CLCNet (S:ViT+D:VOLO-D3) (retrain)
57.46
GFLOPs· 2022-05-19
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network Code
#68TransNeXt-Base (IN-1K supervised, 384)
56.3
GFLOPs· 2023-11-28
TransNeXt: Robust Foveal Visual Perception for Vision Transformers Code
#69XCiT-S12
55.6
GFLOPs· 2021-06-17
XCiT: Cross-Covariance Image Transformers Code
#70ResNet-RS-270 (256 image res)
54
GFLOPs· 2021-03-13
Revisiting ResNets: Improved Training and Scaling Strategies Code
#71EfficientNetV2-L (21k)
53
GFLOPs· 2021-04-01
EfficientNetV2: Smaller Models and Faster Training Code
#72EfficientNetV2-L
53
GFLOPs· 2021-04-01
EfficientNetV2: Smaller Models and Faster Training Code
#73CLCNet (S:ViT+D:EffNet-B7) (retrain)
51.93
GFLOPs· 2022-05-19
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network Code
#74UniNet-B6
51
GFLOPs· 2022-07-12
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP Code
#75Sequencer2D-L↑392
50.7
GFLOPs· 2022-05-04
Sequencer: Deep LSTM for Image Classification Code
#76VAN-B5 (22K, 384res)
50.6
GFLOPs· 2022-02-20
Visual Attention Network Code
#77PNASNet-5SOTA
50
GFLOPs· 2017-12-02
Progressive Neural Architecture Search Code
#78DAT-B (384 res, IN-1K only)
49.8
GFLOPs· 2022-01-03
Vision Transformer with Deformable Attention Code
#79DAT-B++ (384x384)
49.7
GFLOPs· 2023-09-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention Code
#80CAIT-S-36
48
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#81CLCNet (S:D1+D:D5)
47.43
GFLOPs· 2022-05-19
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network Code
#82Swin-B
47
GFLOPs· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#83Conformer-B
46.6
GFLOPs· 2021-05-09
Conformer: Local Features Coupling Global Representations for Visual Recognition Code
#84DaViT-B (ImageNet-22k)
46.4
GFLOPs· 2022-04-07
DaViT: Dual Attention Vision Transformers Code
#85CLCNet (S:ConvNeXt-L+D:EffNet-B7) (retrain)
45.43
GFLOPs· 2022-05-19
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network Code
#86MaxViT-L (224res)
43.9
GFLOPs· 2022-04-04
MaxViT: Multi-Axis Vision Transformer Code
#87SReT-S (512 res, ImageNet-1K only)
42.8
GFLOPs· 2021-11-09
Sliced Recursive Transformer Code
#88CAFormer-M36 (384 res, 21K)
42
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#89CAFormer-M36 (384 res)
42
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#90LITv2-B|384
39.7
GFLOPs· 2022-05-26
Fast Vision Transformers with HiLo Attention Code
#91UniFormer-L (384 res)
39.2
GFLOPs· 2022-01-24
UniFormer: Unifying Convolution and Self-attention for Visual Recognition Code
#92VAN-B6 (22K)
38.9
GFLOPs· 2022-02-20
Visual Attention Network Code
#93SE-ResNeXt-101, 64x4d, S=2(320px)
38.2
GFLOPs· 2020-11-30
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training Code
#94RevBiFPN-S6
38.1
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#95ConvFormer-M36 (384 res, 21K)
37.7
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#96ConvFormer-M36 (384 res)
37.7
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#97NoisyStudent (EfficientNet-B7)
37
GFLOPs· 2019-11-11
Self-training with Noisy Student improves ImageNet classification Code
#98EfficientNet-B7
37
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#99FasterViT-4
36.6
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#100ActiveMLP-L
36.4
GFLOPs· 2022-03-11
Active Token Mixer Code
#101VAN-B4 (22K, 384res)
35.9
GFLOPs· 2022-02-20
Visual Attention Network Code
#102NFNet-F1
35.54
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#103DeiT-B with iRPE-K
35.368
GFLOPs· 2021-07-29
Rethinking and Improving Relative Position Encoding for Vision Transformer Code
#104MambaVision-L
34.9
GFLOPs· 2024-07-10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Code
#105RDNet-L (384 res)
34.7
GFLOPs· 2024-03-28
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Code
#106RDNet-L
34.7
GFLOPs· 2024-03-28
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Code
#107CoAtNet-3
34.7
GFLOPs· 2021-06-09
CoAtNet: Marrying Convolution and Attention for All Data Sizes Code
#108DiNAT_s-Large (224x224; Pretrained on ImageNet-22K @ 224x224)
34.5
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#109T2T-ViT-14|384
34.2
GFLOPs· 2021-01-28
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Code
#110MViT-B-24
32.7
GFLOPs· 2021-04-22
Multiscale Vision Transformers Code
#111CAIT-S-24
32.2
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#112TransNeXt-Small (IN-1K supervised, 384)
32.1
GFLOPs· 2023-11-28
TransNeXt: Robust Foveal Visual Perception for Vision Transformers Code
#113Next-ViT-L @384
32
GFLOPs· 2022-07-12
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios Code
#114VVT-L (384 res)
31.8
GFLOPs· 2022-06-21
Vicinity Vision Transformer Code
#115gMLP-B
31.6
GFLOPs· 2021-05-17
Pay Attention to MLPs Code
#116ResNeXt-101 64x4SOTA
31.5
GFLOPs· 2016-11-16
Aggregated Residual Transformations for Deep Neural Networks Code
#117Harm-SE-RNX-101 64x4d (320x320, Mean-Max Pooling)
31.4
GFLOPs· 2020-01-18
Harmonic Convolutional Networks based on Discrete Cosine Transform Code
#118TinySaver(ConvNeXtV2_h, 0.01 Acc drop)
31.17
GFLOPs· 2024-03-26
Tiny Models are the Computational Saver for Large Models Code
#119T2T-ViTt-24
30
GFLOPs· 2021-01-28
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Code
#120ConViT-B+
30
GFLOPs· 2021-03-19
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases Code
#121CAIT-XS-36
28.8
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#122ViTAE-B-Stage
27.6
GFLOPs· 2021-06-07
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Code
#123T2T-ViT-24
27.6
GFLOPs· 2021-01-28
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Code
#124TinyViT-21M-512-distill (512 res, 21k)
27
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#125SE-CoTNetD-152
26.5
GFLOPs· 2021-07-26
Contextual Transformer Networks for Visual Recognition Code
#126CAFormer-S36 (384 res, 21K)
26
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#127CAFormer-S36 (384 res)
26
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#128CvT-21 (384 res, ImageNet-22k pretrain)
25
GFLOPs· 2021-03-29
CvT: Introducing Convolutions to Vision Transformers Code
#129CvT-21 (384 res)
24.9
GFLOPs· 2021-03-29
CvT: Introducing Convolutions to Vision Transformers Code
#130ResMLP-B24 + STD
24.1
GFLOPs
No paperCode
#131EfficientNetV2-M (21k)
24
GFLOPs· 2021-04-01
EfficientNetV2: Smaller Models and Faster Training Code
#132NASNET-A(6)
23.8
GFLOPs· 2017-07-21
Learning Transferable Architectures for Scalable Image Recognition Code
#133MaxViT-B (224res)
23.4
GFLOPs· 2022-04-04
MaxViT: Multi-Axis Vision Transformer Code
#134CAFormer-B36 (224 res, 21K)
23.2
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#135CAFormer-B36 (224 res)
23.2
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#136UniNet-B5
23.2
GFLOPs· 2021-10-08
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
#137MetaFormer PoolFormer-M48
23.2
GFLOPs· 2021-11-22
MetaFormer Is Actually What You Need for Vision Code
#138ConvFormer-B36 (224 res, 21K)
22.6
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#139ConvFormer-B36 (224 res)
22.6
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#140ConvFormer-S36 (384 res, 21K)
22.4
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#141ConvFormer-S36 (384 res)
22.4
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#142Oct-ResNet-152 (SE)
22.2
GFLOPs· 2019-04-10
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution Code
#143RevBiFPN-S5
21.8
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#144UniNet-B5
20.4
GFLOPs· 2022-07-12
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP Code
#145EfficientViT-L2 (r384)
20
GFLOPs· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#146T2T-ViTt-19
19.6
GFLOPs· 2021-01-28
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Code
#147TinySaver(ConvNeXtV2_h, 0.5 Acc drop)
19.41
GFLOPs· 2024-03-26
Tiny Models are the Computational Saver for Large Models Code
#148CAIT-XS-24
19.3
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#149BoTNet T5
19.3
GFLOPs· 2021-01-27
Bottleneck Transformers for Visual Recognition Code
#150EfficientNet-B6
19
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#151MIRL(ViT-S-54)
18.8
GFLOPs· 2023-09-25
Masked Image Residual Learning for Scaling Deeper Vision Transformers Code
#152ResNeXt-101, 64x4d, S=2(224px)
18.8
GFLOPs· 2020-11-30
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training Code
#153CLCNet (S:B4+D:B7)
18.58
GFLOPs· 2022-05-19
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network Code
#154SReT-S (384 res, ImageNet-1K only)
18.5
GFLOPs· 2021-11-09
Sliced Recursive Transformer Code
#155RepVGG-B2
18.4
GFLOPs· 2021-01-11
RepVGG: Making VGG-style ConvNets Great Again Code
#156FasterViT-3
18.2
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#157Transformer local-attention (NesT-B)
17.9
GFLOPs· 2021-05-26
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding Code
#158RVT-B*
17.7
GFLOPs· 2021-05-17
Towards Robust Vision Transformer Code
#159VAN-B5 (22K)
17.2
GFLOPs· 2022-02-20
Visual Attention Network Code
#160KAT-B*
17.06
GFLOPs· 2024-09-16
Kolmogorov-Arnold Transformer Code
#161ConViT-B
17
GFLOPs· 2021-03-19
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases Code
#162GLiT-Bases
17
GFLOPs· 2021-07-07
GLiT: Neural Architecture Search for Global and Local Image Transformer Code
#163T2T-ViT-19
17
GFLOPs· 2021-01-28
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Code
#164DeiT-B
16.87
GFLOPs· 2024-09-16
Kolmogorov-Arnold Transformer Code
#165ViT-B/16
16.87
GFLOPs· 2024-09-16
Kolmogorov-Arnold Transformer Code
#166Pyramid ViG-B
16.8
GFLOPs· 2022-06-01
Vision GNN: An Image is Worth Graph of Nodes Code
#167DAT-B++ (224x224)
16.6
GFLOPs· 2023-09-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention Code
#168Sequencer2D-L
16.6
GFLOPs· 2022-05-04
Sequencer: Deep LSTM for Image Classification Code
#169MixMIM-B
16.3
GFLOPs· 2022-05-26
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers Code
#170CvT-13 (384 res)
16.3
GFLOPs· 2021-03-29
CvT: Introducing Convolutions to Vision Transformers Code
#171InternImage-B
16
GFLOPs· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#172LV-ViT-M
16
GFLOPs· 2021-04-22
All Tokens Matter: Token Labeling for Training Better Vision Transformers Code
#173MogaNet-L
15.9
GFLOPs· 2022-11-07
MogaNet: Multi-order Gated Aggregation Network Code
#174Assemble-ResNet152
15.8
GFLOPs· 2020-01-17
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network Code
#175BossNet-T1
15.8
GFLOPs· 2021-03-23
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search Code
#176CoAtNet-2
15.7
GFLOPs· 2021-06-09
CoAtNet: Marrying Convolution and Attention for All Data Sizes Code
#177DaViT-B
15.5
GFLOPs· 2022-04-07
DaViT: Dual Attention Vision Transformers Code
#178ViT-S @384 (DeiT III)
15.5
GFLOPs· 2022-04-14
DeiT III: Revenge of the ViT Code
#179RDNet-B
15.4
GFLOPs· 2024-03-28
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Code
#180DeepMAD-89M
15.4
GFLOPs· 2023-03-05
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network Code
#181Shift-B
15.2
GFLOPs· 2022-01-26
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism Code
#182Twins-SVT-L
15.1
GFLOPs· 2021-04-28
Twins: Revisiting the Design of Spatial Attention in Vision Transformers Code
#183MambaVision-B
15
GFLOPs· 2024-07-10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Code
#184Wave-ViT-L
14.8
GFLOPs· 2022-07-11
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning Code
#185GC ViT-B
14.8
GFLOPs· 2022-06-20
Global Context Vision Transformers Code
#186CAIT-XXS-36
14.3
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#187ZenNAS (0.8ms)
13.9
GFLOPs· 2021-02-01
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition Code
#188TinyViT-21M-384-distill (384 res, 21k)
13.8
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#189DiNAT-Base
13.7
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#190NAT-Base
13.7
GFLOPs· 2022-04-14
Neighborhood Attention Transformer Code
#191HRFormer-B
13.7
GFLOPs· 2021-10-18
HRFormer: High-Resolution Transformer for Dense Prediction Code
#192CAFormer-S18 (384 res, 21K)
13.4
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#193CAFormer-S18 (384 res)
13.4
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#194ViL-Base-D
13.4
GFLOPs· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#195CAFormer-M36 (224 res, 21K)
13.2
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#196CAFormer-M36 (224 res)
13.2
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#197LITv2-B
13.2
GFLOPs· 2022-05-26
Fast Vision Transformers with HiLo Attention Code
#198GTP-DeiT-B/P8
13.1
GFLOPs· 2023-11-06
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation Code
#199CeiT-S (384 finetune res)
12.9
GFLOPs· 2021-03-22
Incorporating Convolution Designs into Visual Transformers Code
#200ConvFormer-M36 (224 res, 21K)
12.8
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#201ConvFormer-M36 (224 res)
12.8
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#202UniFormer-L
12.6
GFLOPs· 2022-01-24
UniFormer: Unifying Convolution and Self-attention for Visual Recognition Code
#203PiT-B
12.5
GFLOPs· 2021-03-30
Rethinking Spatial Dimensions of Vision Transformers Code
#204NFNet-F0
12.38
GFLOPs· 2021-02-11
High-Performance Large-Scale Image Recognition Without Normalization Code
#205CycleMLP-B5
12.3
GFLOPs· 2021-07-21
CycleMLP: A MLP-like Architecture for Dense Prediction Code
#206VAN-B4 (22K)
12.2
GFLOPs· 2022-02-20
Visual Attention Network Code
#207ViTAE-S-Stage
12
GFLOPs· 2021-06-07
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Code
#208PVTv2-B4
11.8
GFLOPs· 2021-06-25
PVT v2: Improved Baselines with Pyramid Vision Transformer Code
#209MaxViT-S (224res)
11.7
GFLOPs· 2022-04-04
MaxViT: Multi-Axis Vision Transformer Code
#210ConvFormer-S18 (384 res, 21K)
11.6
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#211ConvFormer-S18 (384 res)
11.6
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#212ResNet-152SOTA
11.3
GFLOPs· 2015-12-10
Deep Residual Learning for Image Recognition Code
#213RepVGG-B2g4
11.3
GFLOPs· 2021-01-11
RepVGG: Making VGG-style ConvNets Great Again Code
#214ScaleNet-152
11.2
GFLOPs· 2019-04-20
Data-Driven Neuron Allocation for Scale Aggregation Networks Code
#215Sequencer2D-M
11.1
GFLOPs· 2022-05-04
Sequencer: Deep LSTM for Image Classification Code
#216CCT-14/7x2
11.06
GFLOPs· 2021-04-12
Escaping the Big Data Paradigm with Compact Transformers Code
#217EfficientViT-L2 (r288)
11
GFLOPs· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#218AutoFormer-base
11
GFLOPs· 2021-07-01
AutoFormer: Searching Transformers for Visual Recognition Code
#219BoTNet T4
10.9
GFLOPs· 2021-01-27
Bottleneck Transformers for Visual Recognition Code
#220ECA-Net (ResNet-152)
10.83
GFLOPs· 2019-10-08
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Code
#221VVT-L (224 res)
10.8
GFLOPs· 2022-06-21
Vicinity Vision Transformer Code
#222RevBiFPN-S4
10.6
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#223Transformer local-attention (NesT-S)
10.4
GFLOPs· 2021-05-26
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding Code
#224TransNeXt-Small (IN-1K supervised, 224)
10.3
GFLOPs· 2023-11-28
TransNeXt: Robust Foveal Visual Perception for Vision Transformers Code
#225ConViT-S+
10
GFLOPs· 2021-03-19
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases Code
#226MogaNet-B
9.9
GFLOPs· 2022-11-07
MogaNet: Multi-order Gated Aggregation Network Code
#227UniNet-B4
9.9
GFLOPs· 2021-10-08
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
#228EfficientNet-B5
9.9
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#229DeiT-S with iRPE-QKV
9.77
GFLOPs· 2021-07-29
Rethinking and Improving Relative Position Encoding for Vision Transformer Code
#230QnA-ViT-Base
9.7
GFLOPs· 2021-12-21
Learned Queries for Efficient Local Attention Code
#231T2T-ViT-14
9.6
GFLOPs· 2021-01-28
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Code
#232CAIT-XXS-24
9.6
GFLOPs· 2021-03-31
Going deeper with Image Transformers Code
#233CrossViT-18+
9.5
GFLOPs· 2021-03-27
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Code
#234DeiT-S with iRPE-QK
9.412
GFLOPs· 2021-07-29
Rethinking and Improving Relative Position Encoding for Vision Transformer Code
#235DAT-S++
9.4
GFLOPs· 2023-09-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention Code
#236CentroidViT-S (arXiv, 2021-02)
9.4
GFLOPs· 2021-02-17
Centroid Transformers: Learning to Abstract with Attention
#237DeiT-S with iRPE-K
9.318
GFLOPs· 2021-07-29
Rethinking and Improving Relative Position Encoding for Vision Transformer Code
#238SpineNet-143
9.1
GFLOPs· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#239DAT-S
9
GFLOPs· 2022-01-03
Vision Transformer with Deformable Attention Code
#240CrossViT-18
9
GFLOPs· 2021-03-27
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Code
#241Pyramid ViG-M
8.9
GFLOPs· 2022-06-01
Vision GNN: An Image is Worth Graph of Nodes Code
#242EfficientNetV2-S (21k)
8.8
GFLOPs· 2021-04-01
EfficientNetV2: Smaller Models and Faster Training Code
#243FasterViT-2
8.7
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#244RDNet-S
8.7
GFLOPs· 2024-03-28
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Code
#245ViL-Medium-D
8.7
GFLOPs· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#246GFNet-H-B
8.6
GFLOPs· 2021-07-01
Global Filter Networks for Image Classification Code
#247GC ViT-S
8.5
GFLOPs· 2022-06-20
Global Context Vision Transformers Code
#248SE-CoTNetD-101
8.5
GFLOPs· 2021-07-26
Contextual Transformer Networks for Visual Recognition Code
#249Shift-S
8.5
GFLOPs· 2022-01-26
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism Code
#250SKNet-101
8.46
GFLOPs· 2019-03-15
Selective Kernel Networks Code
#251CoAtNet-1
8.4
GFLOPs· 2021-06-09
CoAtNet: Marrying Convolution and Attention for All Data Sizes Code
#252SCARLET-A4
8.4
GFLOPs· 2019-08-16
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search Code
#253Sequencer2D-S
8.4
GFLOPs· 2022-05-04
Sequencer: Deep LSTM for Image Classification Code
#254Next-ViT-B
8.3
GFLOPs· 2022-07-12
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios Code
#255Container Container
8.1
GFLOPs· 2021-06-02
Container: Context Aggregation Network Code
#256CAFormer-S36 (224 res, 21K)
8
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#257ELSA-VOLO-D1
8
GFLOPs· 2021-12-23
ELSA: Enhanced Local Self-Attention for Vision Transformer Code
#258CAFormer-S36 (224 res)
8
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#259InternImage-S
8
GFLOPs· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#260GTP-LV-ViT-M/P8
8
GFLOPs· 2023-11-06
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation Code
#261RegNetY-8.0GF
8
GFLOPs· 2020-03-30
Designing Network Design Spaces Code
#262ResT-Large
7.9
GFLOPs· 2021-05-28
ResT: An Efficient Transformer for Visual Recognition Code
#263RandWire-WS
7.9
GFLOPs· 2019-04-02
Exploring Randomly Wired Neural Networks for Image Recognition Code
#264SGE-ResNet101
7.858
GFLOPs· 2019-05-23
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks Code
#265DiNAT-Small
7.8
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#266NAT-Small
7.8
GFLOPs· 2022-04-14
Neighborhood Attention Transformer Code
#267IPT-B
7.8
GFLOPs· 2022-12-06
IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation Code
#268MViT-B-16
7.8
GFLOPs· 2021-04-22
Multiscale Vision Transformers Code
#269ConvFormer-S36 (224 res, 21K)
7.6
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#270ConvFormer-S36 (224 res)
7.6
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#271ResNet-101
7.6
GFLOPs· 2015-12-10
Deep Residual Learning for Image Recognition Code
#272AOGNet-40M-AN
7.51
GFLOPs· 2019-08-04
Attentive Normalization Code
#273LITv2-M
7.5
GFLOPs· 2022-05-26
Fast Vision Transformers with HiLo Attention Code
#274MambaVision-S
7.5
GFLOPs· 2024-07-10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Code
#275ScaleNet-101
7.5
GFLOPs· 2019-04-20
Data-Driven Neuron Allocation for Scale Aggregation Networks Code
#276ECA-Net (ResNet-101)
7.35
GFLOPs· 2019-10-08
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Code
#277BoTNet T3
7.3
GFLOPs· 2021-01-27
Bottleneck Transformers for Visual Recognition Code
#278Wave-ViT-B
7.2
GFLOPs· 2022-07-11
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning Code
#279CvT-21
7.1
GFLOPs· 2021-03-29
CvT: Introducing Convolutions to Vision Transformers Code
#280HCGNet-C
7.1
GFLOPs· 2019-08-26
Gated Convolutional Networks with Hybrid Connectivity for Image Classification Code
#281gSwin-S
7
GFLOPs· 2022-08-24
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
#282PVTv2-B3
6.9
GFLOPs· 2021-06-25
PVT v2: Improved Baselines with Pyramid Vision Transformer Code
#283ViTAE-13M
6.8
GFLOPs· 2021-06-07
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Code
#284RedNet-152
6.8
GFLOPs· 2021-03-10
Involution: Inverting the Inherence of Convolution for Visual Recognition Code
#285ViL-Base-W
6.74
GFLOPs· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#286LV-ViT-S
6.6
GFLOPs· 2021-04-22
All Tokens Matter: Token Labeling for Training Better Vision Transformers Code
#287EfficientViT-B3 (r288)
6.5
GFLOPs· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#288CI2P-ViT
6.442
GFLOPs· 2025-02-14
Compress image to patches for Vision Transformer Code
#289CrossViT-15+
6.1
GFLOPs· 2021-03-27
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Code
#290ResMLP-S24
6
GFLOPs· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training Code
#291Next-ViT-S
5.8
GFLOPs· 2022-07-12
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios Code
#292Transformer local-attention (NesT-T)
5.8
GFLOPs· 2021-05-26
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding Code
#293CrossViT-15
5.8
GFLOPs· 2021-03-27
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Code
#294TransNeXt-Tiny (IN-1K supervised, 224)
5.7
GFLOPs· 2023-11-28
TransNeXt: Robust Foveal Visual Perception for Vision Transformers Code
#295MOAT-0 1K only
5.7
GFLOPs· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#296MaxViT-T (224res)
5.6
GFLOPs· 2022-04-04
MaxViT: Multi-Axis Vision Transformer Code
#297ConViT-S
5.4
GFLOPs· 2021-03-19
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases Code
#298ResNeSt-50
5.39
GFLOPs· 2020-04-19
ResNeSt: Split-Attention Networks Code
#299EfficientViT-L1 (r224)
5.3
GFLOPs· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#300FasterViT-1
5.3
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#301MambaVision-T2
5.1
GFLOPs· 2024-07-10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Code
#302AutoFormer-small
5.1
GFLOPs· 2021-07-01
AutoFormer: Searching Transformers for Visual Recognition Code
#303MogaNet-S
5
GFLOPs· 2022-11-07
MogaNet: Multi-order Gated Aggregation Network Code
#304RDNet-T
5
GFLOPs· 2024-03-28
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Code
#305VAN-B2
5
GFLOPs· 2022-02-20
Visual Attention Network Code
#306Visformer-S
4.9
GFLOPs· 2021-04-26
Visformer: The Vision-friendly Transformer Code
#307ViL-Small
4.86
GFLOPs· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#308ELSA-Swin-T
4.8
GFLOPs· 2021-12-23
ELSA: Enhanced Local Self-Attention for Vision Transformer Code
#309GTP-LV-ViT-S/P8
4.8
GFLOPs· 2023-11-06
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation Code
#310LocalViT-PVT
4.8
GFLOPs· 2021-04-12
LocalViT: Bringing Locality to Vision Transformers Code
#311Wave-ViT-S
4.7
GFLOPs· 2022-07-11
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning Code
#312GC ViT-T
4.7
GFLOPs· 2022-06-20
Global Context Vision Transformers Code
#313IPT-S
4.7
GFLOPs· 2022-12-06
IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation Code
#314MViTv2-T
4.7
GFLOPs· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#315RVT-S*
4.7
GFLOPs· 2021-05-17
Towards Robust Vision Transformer Code
#316RedNet-101
4.7
GFLOPs· 2021-03-10
Involution: Inverting the Inherence of Convolution for Visual Recognition Code
#317ResNet-RS-50 (160 image res)
4.6
GFLOPs· 2021-03-13
Revisiting ResNets: Improved Training and Scaling Strategies Code
#318Pyramid ViG-S
4.6
GFLOPs· 2022-06-01
Vision GNN: An Image is Worth Graph of Nodes Code
#319DAT-T
4.6
GFLOPs· 2022-01-03
Vision Transformer with Deformable Attention Code
#320LocalViT-S
4.6
GFLOPs· 2021-04-12
LocalViT: Bringing Locality to Vision Transformers Code
#321ViTAE-T-Stage
4.6
GFLOPs· 2021-06-07
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Code
#322ConvNeXt-T
4.5
GFLOPs· 2022-01-10
A ConvNet for the 2020s Code
#323CeiT-S
4.5
GFLOPs· 2021-03-22
Incorporating Convolution Designs into Visual Transformers Code
#324CvT-13
4.5
GFLOPs· 2021-03-29
CvT: Introducing Convolutions to Vision Transformers Code
#325Swin-T
4.5
GFLOPs· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#326QnA-ViT-Small
4.4
GFLOPs· 2021-12-21
Learned Queries for Efficient Local Attention Code
#327MambaVision-T
4.4
GFLOPs· 2024-07-10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Code
#328Shift-T
4.4
GFLOPs· 2022-01-26
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism Code
#329GLiT-Smalls
4.4
GFLOPs· 2021-07-07
GLiT: Neural Architecture Search for Global and Local Image Transformer Code
#330ResNeSt-50-fast
4.34
GFLOPs· 2020-04-19
ResNeSt: Split-Attention Networks Code
#331TinyViT-21M-distill (21k)
4.3
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#332DAT-T++
4.3
GFLOPs· 2023-09-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention Code
#333NAT-Tiny
4.3
GFLOPs· 2022-04-14
Neighborhood Attention Transformer Code
#334TinyViT-21M
4.3
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#335DiNAT-Tiny
4.3
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#336Mixer-S16 + STD
4.3
GFLOPs
No paperCode
#337EfficientNet-B4
4.2
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#338CoAtNet-0
4.2
GFLOPs· 2021-06-09
CoAtNet: Marrying Convolution and Attention for All Data Sizes Code
#339SGE-ResNet50
4.127
GFLOPs· 2019-05-23
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks Code
#340CAFormer-S18 (224 res, 21K)
4.1
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#341CAFormer-S18 (224 res)
4.1
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#342CvT-13-NAS
4.1
GFLOPs· 2021-03-29
CvT: Introducing Convolutions to Vision Transformers Code
#343SE-CoTNetD-50
4.1
GFLOPs· 2021-07-26
Contextual Transformer Networks for Visual Recognition Code
#344EfficientViT-B3 (r224)
4
GFLOPs· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#345CycleMLP-B2 + STD
4
GFLOPs
No paperCode
#346PVTv2-B2
4
GFLOPs· 2021-06-25
PVT v2: Improved Baselines with Pyramid Vision Transformer Code
#347ActiveMLP-T
4
GFLOPs· 2022-03-11
Active Token Mixer Code
#348RegNetY-4.0GF
4
GFLOPs· 2020-03-30
Designing Network Design Spaces Code
#349ViTAE-6M
4
GFLOPs· 2021-06-07
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Code
#350ConvFormer-S18 (224 res, 21K)
3.9
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#351ConvFormer-S18 (224 res)
3.9
GFLOPs· 2022-10-24
MetaFormer Baselines for Vision Code
#352ECA-Net (ResNet-50)
3.86
GFLOPs· 2019-10-08
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Code
#353ScaleNet-50
3.8
GFLOPs· 2019-04-20
Data-Driven Neuron Allocation for Scale Aggregation Networks Code
#354ResNet-50
3.8
GFLOPs· 2015-12-10
Deep Residual Learning for Image Recognition Code
#355LITv2-S
3.7
GFLOPs· 2022-05-26
Fast Vision Transformers with HiLo Attention Code
#356DY-ResNet-18
3.7
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#357UniFormer-S
3.6
GFLOPs· 2022-01-24
UniFormer: Unifying Convolution and Self-attention for Visual Recognition Code
#358gSwin-T
3.6
GFLOPs· 2022-08-24
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
#359CeiT-T (384 finetune res)
3.6
GFLOPs· 2021-03-22
Incorporating Convolution Designs into Visual Transformers Code
#360CAS-ViT-T
3.597
GFLOPs· 2024-08-07
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Code
#361EdgeFormer-S
3.48
GFLOPs· 2022-03-08
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer Code
#362ReXNet_3.0
3.4
GFLOPs· 2020-07-02
Rethinking Channel Dimensions for Efficient Model Design Code
#363GTP-DeiT-S/P8
3.4
GFLOPs· 2023-11-06
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation Code
#364RevBiFPN-S3
3.33
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#365FasterViT-0
3.3
GFLOPs· 2023-06-09
FasterViT: Fast Vision Transformers with Hierarchical Attention Code
#366Container-Light
3.2
GFLOPs· 2021-06-02
Container: Context Aggregation Network Code
#367ResMLP-12 (distilled, class-MLP)
3
GFLOPs· 2021-05-07
ResMLP: Feedforward networks for image classification with data-efficient training Code
#368ViTAE-T
3
GFLOPs· 2021-06-07
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Code
#369MobileOne-S4
2.978
GFLOPs· 2022-06-08
MobileOne: An Improved One millisecond Mobile Backbone Code
#370PiT-S
2.9
GFLOPs· 2021-03-30
Rethinking Spatial Dimensions of Vision Transformers Code
#371MobileOne-S4 (distill)
2.9
GFLOPs· 2022-06-08
MobileOne: An Improved One millisecond Mobile Backbone Code
#372TransNeXt-Micro (IN-1K supervised, 224)
2.7
GFLOPs· 2023-11-28
TransNeXt: Robust Foveal Visual Perception for Vision Transformers Code
#373NAT-Mini
2.7
GFLOPs· 2022-04-14
Neighborhood Attention Transformer Code
#374DiNAT-Mini
2.7
GFLOPs· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#375RedNet-50
2.7
GFLOPs· 2021-03-10
Involution: Inverting the Inherence of Convolution for Visual Recognition Code
#376GC ViT-XT
2.6
GFLOPs· 2022-06-20
Global Context Vision Transformers Code
#377EdgeNeXt-S
2.6
GFLOPs· 2022-06-21
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications Code
#378LR-Net-26
2.6
GFLOPs· 2019-04-25
Local Relation Networks for Image Recognition Code
#379DeiT-Ti with iRPE-K
2.568
GFLOPs· 2021-07-29
Rethinking and Improving Relative Position Encoding for Vision Transformer Code
#380QnA-ViT-Tiny
2.5
GFLOPs· 2021-12-21
Learned Queries for Efficient Local Attention Code
#381VAN-B1
2.5
GFLOPs· 2022-02-20
Visual Attention Network Code
#382UniNet-B2
2.4
GFLOPs· 2021-10-08
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
#383HVT-S-1
2.4
GFLOPs· 2021-03-19
Scalable Vision Transformers with Hierarchical Pooling Code
#384LeViT-384
2.334
GFLOPs· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference Code
#385IPT-T
2.3
GFLOPs· 2022-12-06
IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation Code
#386gSwin-VT
2.3
GFLOPs· 2022-08-24
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
#387RedNet-38
2.2
GFLOPs· 2021-03-10
Involution: Inverting the Inherence of Convolution for Visual Recognition Code
#388Ghost-ResNet-50 (s=2)
2.2
GFLOPs· 2019-11-27
GhostNet: More Features from Cheap Operations Code
#389FBNetV5-F-CLS
2.1
GFLOPs· 2021-11-19
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run
#390EfficientViT-B2 (r256)
2.1
GFLOPs· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#391GC ViT-XXT
2.1
GFLOPs· 2022-06-20
Global Context Vision Transformers Code
#392PVTv2-B1
2.1
GFLOPs· 2021-06-25
PVT v2: Improved Baselines with Pyramid Vision Transformer Code
#393TinyViT-11M-distill (21k)
2
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#394CloFormer-S
2
GFLOPs· 2023-03-31
Rethinking Local Perception in Lightweight Vision Transformer Code
#395TinyViT-11M
2
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#396HCGNet-B
2
GFLOPs· 2019-08-26
Gated Convolutional Networks with Hybrid Connectivity for Image Classification Code
#397ConViT-Ti+
2
GFLOPs· 2021-03-19
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases Code
#398ResT-Small
1.9
GFLOPs· 2021-05-28
ResT: An Efficient Transformer for Visual Recognition Code
#399MobileOne-S3
1.896
GFLOPs· 2022-06-08
MobileOne: An Improved One millisecond Mobile Backbone Code
#400CAS-ViT-M
1.887
GFLOPs· 2024-08-07
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Code
#401NASViT (supernet)
1.881
GFLOPs
No paperCode
#402MobileViTv3-1.0
1.876
GFLOPs· 2022-09-30
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features Code
#403MobileViTv3-S
1.841
GFLOPs· 2022-09-30
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features Code
#404DY-ResNet-10
1.82
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#405HRFormer-T
1.8
GFLOPs· 2021-10-18
HRFormer: High-Resolution Transformer for Dense Prediction Code
#406MobileViTv2-1.0
1.8
GFLOPs· 2022-06-06
Separable Self-attention for Mobile Vision Transformers Code
#407DVT (T2T-ViT-12)
1.7
GFLOPs· 2021-05-31
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition Code
#408Pyramid ViG-Ti
1.7
GFLOPs· 2022-06-01
Vision GNN: An Image is Worth Graph of Nodes Code
#409RedNet-26
1.7
GFLOPs· 2021-03-10
Involution: Inverting the Inherence of Convolution for Visual Recognition Code
#410FixEfficientNet-B0
1.6
GFLOPs· 2020-03-18
Fixing the train-test resolution discrepancy: FixEfficientNet Code
#411RegNetY-1.6GF
1.6
GFLOPs· 2020-03-30
Designing Network Design Spaces Code
#412ReXNet_2.0
1.5
GFLOPs· 2020-07-02
Rethinking Channel Dimensions for Efficient Model Design Code
#413MogaNet-T (256res)
1.44
GFLOPs· 2022-11-07
MogaNet: Multi-order Gated Aggregation Network Code
#414PiT-XS
1.4
GFLOPs· 2021-03-30
Rethinking Spatial Dimensions of Vision Transformers Code
#415GLiT-Tinys
1.4
GFLOPs· 2021-07-07
GLiT: Neural Architecture Search for Global and Local Image Transformer Code
#416LocalViT-TNT
1.4
GFLOPs· 2021-04-12
LocalViT: Bringing Locality to Vision Transformers Code
#417RevBiFPN-S2
1.37
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#418TinyViT-5M-distill (21k)
1.3
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#419RVT-Ti*
1.3
GFLOPs· 2021-05-17
Towards Robust Vision Transformer Code
#420TinyViT-5M
1.3
GFLOPs· 2022-07-21
TinyViT: Fast Pretraining Distillation for Small Vision Transformers Code
#421Visformer-Ti
1.3
GFLOPs· 2021-04-26
Visformer: The Vision-friendly Transformer Code
#422ViL-Tiny-RPB
1.3
GFLOPs· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#423LocalViT-T
1.3
GFLOPs· 2021-04-12
LocalViT: Bringing Locality to Vision Transformers Code
#424AutoFormer-tiny
1.3
GFLOPs· 2021-07-01
AutoFormer: Searching Transformers for Visual Recognition Code
#425MobileOne-S2
1.299
GFLOPs· 2022-06-08
MobileOne: An Improved One millisecond Mobile Backbone Code
#426SReT-LT (Fast Knowledge Distillation)
1.2
GFLOPs· 2021-12-02
A Fast Knowledge Distillation Framework for Visual Recognition Code
#427CeiT-T
1.2
GFLOPs· 2021-03-22
Incorporating Convolution Designs into Visual Transformers Code
#428Ghost-ResNet-50 (s=4)
1.2
GFLOPs· 2019-11-27
GhostNet: More Features from Cheap Operations Code
#429LocalViT-T2T
1.2
GFLOPs· 2021-04-12
LocalViT: Bringing Locality to Vision Transformers Code
#430MobileNet-224 (CGD)
1.198
GFLOPs· 2019-07-23
Compact Global Descriptor for Neural Networks Code
#431MobileNetV2 (1.4)
1.17
GFLOPs· 2018-01-13
MobileNetV2: Inverted Residuals and Linear Bottlenecks Code
#432MobileNet-224 ×1.25
1.138
GFLOPs· 2017-04-17
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Code
#433CloFormer-XS
1.1
GFLOPs· 2023-03-31
Rethinking Local Perception in Lightweight Vision Transformer Code
#434SReT-T
1.1
GFLOPs· 2021-11-09
Sliced Recursive Transformer Code
#435LeViT-256
1.066
GFLOPs· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference Code
#436MobileViTv3-0.75
1.064
GFLOPs· 2022-09-30
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features Code
#437MogaNet-XT (256res)
1.04
GFLOPs· 2022-11-07
MogaNet: Multi-order Gated Aggregation Network Code
#438FBNetV5-C-CLS
1
GFLOPs· 2021-11-19
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run
#439EfficientNet-B2
1
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#440MobileViTv2-0.75
1
GFLOPs· 2022-06-06
Separable Self-attention for Mobile Vision Transformers Code
#441ConViT-Ti
1
GFLOPs· 2021-03-19
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases Code
#442UniNet-B1
0.99
GFLOPs· 2021-10-08
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
#443CAS-ViT-S
0.932
GFLOPs· 2024-08-07
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Code
#444MobileViTv3-XS
0.927
GFLOPs· 2022-09-30
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features Code
#445VAN-B0
0.9
GFLOPs· 2022-02-20
Visual Attention Network Code
#446ReXNet_1.5
0.86
GFLOPs· 2020-07-02
Rethinking Channel Dimensions for Efficient Model Design Code
#447EfficientNet-B0 (CondConv)
0.826
GFLOPs· 2019-04-10
CondConv: Conditionally Parameterized Convolutions for Efficient Inference Code
#448MobileOne-S1
0.825
GFLOPs· 2022-06-08
MobileOne: An Improved One millisecond Mobile Backbone Code
#449ZenNet-400M-SE
0.82
GFLOPs· 2021-02-01
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition Code
#450MnasNet-A3
0.806
GFLOPs· 2018-07-31
MnasNet: Platform-Aware Neural Architecture Search for Mobile Code
#451RegNetY-800MF
0.8
GFLOPs· 2020-03-30
Designing Network Design Spaces Code
#452FairNAS-A
0.776
GFLOPs· 2019-07-03
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search Code
#453NASViT-A5
0.757
GFLOPs
No paperCode
#454SCARLET-A
0.73
GFLOPs· 2019-08-16
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search Code
#455FBNetV5
0.726
GFLOPs· 2021-11-19
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run
#456AlphaNet-A6
0.709
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#457DVT (T2T-ViT-10)
0.7
GFLOPs· 2021-05-31
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition Code
#458EfficientNet-B1
0.7
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#459MobileViT-XS
0.7
GFLOPs· 2021-10-05
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer Code
#460PiT-Ti
0.7
GFLOPs· 2021-03-30
Rethinking Spatial Dimensions of Vision Transformers Code
#461SReT-ExT
0.7
GFLOPs· 2021-11-09
Sliced Recursive Transformer Code
#462FairNAS-B
0.69
GFLOPs· 2019-07-03
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search Code
#463FBNetV5-A-CLS
0.685
GFLOPs· 2021-11-19
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run
#464MnasNet-A2
0.68
GFLOPs· 2018-07-31
MnasNet: Platform-Aware Neural Architecture Search for Mobile Code
#465ReXNet_1.3
0.66
GFLOPs· 2020-07-02
Rethinking Channel Dimensions for Efficient Model Design Code
#466SCARLET-B
0.658
GFLOPs· 2019-08-16
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search Code
#467FairNAS-C
0.642
GFLOPs· 2019-07-03
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search Code
#468HVT-Ti-1
0.64
GFLOPs· 2021-03-19
Scalable Vision Transformers with Hierarchical Pooling Code
#469MUXNet-l
0.636
GFLOPs· 2020-03-31
MUXConv: Information Multiplexing in Convolutional Neural Networks Code
#470LeViT-192
0.624
GFLOPs· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference Code
#471MnasNet-A1
0.624
GFLOPs· 2018-07-31
MnasNet: Platform-Aware Neural Architecture Search for Mobile Code
#472RevBiFPN-S1
0.62
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#473MoGA-A
0.608
GFLOPs· 2019-08-04
MoGA: Searching Beyond MobileNetV3 Code
#474ESPNetv2
0.602
GFLOPs· 2018-11-28
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network Code
#475DVT (T2T-ViT-7)
0.6
GFLOPs· 2021-05-31
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition Code
#476CloFormer-XXS
0.6
GFLOPs· 2023-03-31
Rethinking Local Perception in Lightweight Vision Transformer Code
#477RegNetY-600MF
0.6
GFLOPs· 2020-03-30
Designing Network Design Spaces Code
#478MobileNetV2
0.6
GFLOPs· Extra Data· 2018-01-13
MobileNetV2: Inverted Residuals and Linear Bottlenecks Code
#479PVTv2-B0
0.6
GFLOPs· 2021-06-25
PVT v2: Improved Baselines with Pyramid Vision Transformer Code
#480ShuffleNet V2
0.597
GFLOPs· 2018-07-30
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design Code
#481NASViT-A4
0.591
GFLOPs
No paperCode
#482TinyNet (GhostNet-A)
0.591
GFLOPs· 2020-10-28
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets Code
#483RandWire-WS (small)
0.583
GFLOPs· 2019-04-02
Exploring Randomly Wired Neural Networks for Image Recognition Code
#484MixNet-L
0.565
GFLOPs· 2019-07-22
MixConv: Mixed Depthwise Convolutional Kernels Code
#485UniNet-B0
0.56
GFLOPs· 2021-10-08
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
#486CAS-ViT-XS
0.56
GFLOPs· 2024-08-07
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Code
#487SCARLET-C
0.56
GFLOPs· 2019-08-16
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search Code
#488UniNet-B0
0.555
GFLOPs· 2022-07-12
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP Code
#489DiCENet
0.553
GFLOPs· 2019-06-08
DiCENet: Dimension-wise Convolutions for Efficient Networks Code
#490NASViT-A3
0.528
GFLOPs
No paperCode
#491EdgeNeXt-XXS
0.522
GFLOPs· 2022-06-21
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications Code
#492MobileViTv2-0.5
0.5
GFLOPs· 2022-06-06
Separable Self-attention for Mobile Vision Transformers Code
#493AlphaNet-A5
0.491
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#494MobileViTv3-0.5
0.481
GFLOPs· 2022-09-30
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features Code
#495AlphaNet-A4
0.444
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#496MobileNet V3-Large 1.0
0.438
GFLOPs· 2019-05-06
Searching for MobileNetV3 Code
#497MUXNet-m
0.436
GFLOPs· 2020-03-31
MUXConv: Information Multiplexing in Convolutional Neural Networks Code
#498DY-MobileNetV2 ×0.75
0.435
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#499AsymmNet-Large ×1.0
0.4338
GFLOPs· 2021-04-15
AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks Code
#500NASViT-A2
0.421
GFLOPs
No paperCode
#501ReXNet_1.0
0.4
GFLOPs· 2020-07-02
Rethinking Channel Dimensions for Efficient Model Design Code
#502RegNetY-400MF
0.4
GFLOPs· 2020-03-30
Designing Network Design Spaces Code
#503DGPPF-ResNet50
0.4
GFLOPs
No paperCode
#504EfficientNet-B0
0.39
GFLOPs· 2019-05-28
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Code
#505LeViT-128
0.376
GFLOPs· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference Code
#506FBNet-C
0.375
GFLOPs· 2018-12-09
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search Code
#507GreedyNAS-A
0.366
GFLOPs· 2020-03-25
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet
#508SkipblockNet-L
0.364
GFLOPs· 2021-07-23
Bias Loss for Mobile Neural Networks Code
#509MixNet-M
0.36
GFLOPs· 2019-07-22
MixConv: Mixed Depthwise Convolutional Kernels Code
#510AlphaNet-A3
0.357
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#511ReXNet_0.9
0.35
GFLOPs· 2020-07-02
Rethinking Channel Dimensions for Efficient Model Design Code
#512TinyNet-A + RA
0.339
GFLOPs· 2020-10-28
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets Code
#513GreedyNAS-B
0.324
GFLOPs· 2020-03-25
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet
#514ECA-Net (MobileNetV2)
0.32
GFLOPs· 2019-10-08
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Code
#515AlphaNet-A2
0.317
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#516RevBiFPN-S0
0.31
GFLOPs· 2022-06-28
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Code
#517NASViT-A1
0.309
GFLOPs
No paperCode
#518MobileViTv3-XXS
0.289
GFLOPs· 2022-09-30
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features Code
#519LeViT-128S
0.288
GFLOPs· 2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference Code
#520GreedyNAS-C
0.284
GFLOPs· 2020-03-25
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet
#521FBNetV5-AC-CLS
0.28
GFLOPs· 2021-11-19
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run
#522AlphaNet-A1
0.279
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#523MobileOne-S0 (distill)
0.275
GFLOPs· 2022-06-08
MobileOne: An Improved One millisecond Mobile Backbone Code
#524MixNet-S
0.256
GFLOPs· 2019-07-22
MixConv: Mixed Depthwise Convolutional Kernels Code
#525SkipblockNet-M
0.246
GFLOPs· 2021-07-23
Bias Loss for Mobile Neural Networks Code
#526MUXNet-s
0.234
GFLOPs· 2020-03-31
MUXConv: Information Multiplexing in Convolutional Neural Networks Code
#527GhostNet ×1.3
0.226
GFLOPs· 2019-11-27
GhostNet: More Features from Cheap Operations Code
#528FBNetV5-AR-CLS
0.215
GFLOPs· 2021-11-19
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run
#529CoE-Large + CondConv
0.214
GFLOPs· 2021-07-08
Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs
#530NASViT-A0
0.208
GFLOPs
No paperCode
#531AlphaNet-A0
0.203
GFLOPs· 2021-02-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence Code
#532DY-MobileNetV2 ×0.5
0.203
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#533DGPPF-ResNet18
0.2
GFLOPs
No paperCode
#534BasisNet-MV3
0.198
GFLOPs· 2021-05-07
BasisNet: Two-stage Model Synthesis for Efficient Inference
#535CoE-Large
0.194
GFLOPs· 2021-07-08
Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs
#536GhostNet ×1.0
0.141
GFLOPs· 2019-11-27
GhostNet: More Features from Cheap Operations Code
#537DY-MobileNetV3-Small
0.137
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#538AsymmNet-Large ×0.5
0.1344
GFLOPs· 2021-04-15
AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks Code
#539MUXNet-xs
0.132
GFLOPs· 2020-03-31
MUXConv: Information Multiplexing in Convolutional Neural Networks Code
#540DY-MobileNetV2 ×0.35
0.124
GFLOPs· 2019-12-07
Dynamic Convolution: Attention over Convolution Kernels Code
#541AsymmNet-Small ×1.0
0.1154
GFLOPs· 2021-04-15
AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks Code
#542CoE-Small + CondConv + PWLU
0.1
GFLOPs· 2021-07-08
Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs
#543DGPPF-MobileNetV2
0.1
GFLOPs
No paperCode
#544GhostNet ×0.5
0.042
GFLOPs· 2019-11-27
GhostNet: More Features from Cheap Operations Code