TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Notable Benchmarks

Curated benchmarks on common ML tasks

Computer Vision

CIFAR-10 Accuracy

Percentage correct (higher is better)

99.50ViT-H/14
261 results237 with timeline

CIFAR-100 Accuracy

Percentage correct (higher is better)

96.08EffNet-L2 (SAM)
208 results195 with timeline

CIFAR-10 FID

FID (lower is better)

1.22GMem
92 results86 with timeline

ImageNet 256x256 FID

FID (lower is better)

1.06SiT-XL/2 + UCGM-S (E2E-VAE +...
98 results95 with timeline

LSUN Bedroom 256x256 FID

FID (lower is better)

1.43Diffusion ProjectedGAN
19 results15 with timeline

LSUN Churches 256x256 FID

FID (lower is better)

1.59Projected GAN
26 results24 with timeline

ImageNet SSL Linear Probe

ImageNet Top-1 Accuracy (higher is better)

73.60ResNet50
14 results14 with timeline

COCO Object Detection (mAP)

mAP (higher is better)

56.40UniRepLKNet-XL++
15 results15 with timeline

ADE20K Semantic Segmentation (mIoU)

Validation mIoU (higher is better)

63.60ViT-P (InternImage-H)
230 results224 with timeline

NLP

WikiText-103 Perplexity

Test perplexity (lower is better)

2.40RETRO (7.5B)
85 results79 with timeline

LAMBADA Accuracy

Accuracy (higher is better)

89.70PaLM-540B (Few-Shot)
34 results31 with timeline

One Billion Word PPL

PPL (lower is better)

20.09MDLM (AR baseline)
25 results23 with timeline

GSM8K Accuracy

Accuracy (higher is better)

97.72Claude 3.5 Sonnet (HPT)
164 results144 with timeline

MBPP Code Generation

Accuracy (higher is better)

96.60EG-CFG (DeepSeek-V3-0324)
98 results94 with timeline
Browse all >4,000 tasks and benchmarks →