Metric: Top 1 Accuracy (higher is better)
| # | Model↕ | Top 1 Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | AIMv2-3B (448 res) | 85.9 | No | Multimodal Autoregressive Pre-training of Large ... | 2024-11-21 | Code |
| 2 | Hiera-H (448px) | 83.8 | Yes | Hiera: A Hierarchical Vision Transformer without... | 2023-06-01 | Code |
| 3 | MAE (ViT-H, 448) | 83.4 | Yes | Masked Autoencoders Are Scalable Vision Learners | 2021-11-11 | Code |
| 4 | AIMv2-3B | 81.5 | No | Multimodal Autoregressive Pre-training of Large ... | 2024-11-21 | Code |
| 5 | ViT-NeT (SwinV2-B) | 81.2 | No | - | - | Code |
| 6 | AIMv2-1B | 79.7 | No | Multimodal Autoregressive Pre-training of Large ... | 2024-11-21 | Code |
| 7 | AIMv2-H | 77.9 | No | Multimodal Autoregressive Pre-training of Large ... | 2024-11-21 | Code |
| 8 | AIMv2-L | 76 | No | Multimodal Autoregressive Pre-training of Large ... | 2024-11-21 | Code |
| 9 | FixSENet-154 | 75.4 | Yes | Fixing the train-test resolution discrepancy | 2019-06-14 | Code |
| 10 | SEB+EfficientNet-B5 | 72.3 | No | On the Eigenvalues of Global Covariance Pooling ... | 2022-05-26 | Code |
| 11 | TransFG | 71.7 | No | TransFG: A Transformer Architecture for Fine-gra... | 2021-03-14 | Code |
| 12 | TASN | 68.2 | No | Looking for the Devil in the Details: Learning T... | 2019-03-14 | Code |