Claude 3 Opus

Reported on 8 benchmarks across 2 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing8 results

Code GenerationonMBPP
Accuracy
86.4
best: 96.6 (EG-CFG (DeepSeek-V3-0324))
Long-Context UnderstandingonMMNeedle
1 Image, 2*2 Stitching, Exact Accuracy
52.25
best: 94.6 (GPT-4o)
Long-Context UnderstandingonMMNeedle
1 Image, 4*4 Stitching, Exact Accuracy
12.3
best: 83 (GPT-4o)
Long-Context UnderstandingonMMNeedle
1 Image, 8*8 Stitching, Exact Accuracy
1.6
best: 29.81 (Gemini Pro 1.5)
Long-Context UnderstandingonMMNeedle
10 Images, 1*1 Stitching, Exact Accuracy
66.93
best: 97 (GPT-4o)
Long-Context UnderstandingonMMNeedle
10 Images, 2*2 Stitching, Exact Accuracy
4.6
best: 81.8 (GPT-4o)
Long-Context UnderstandingonMMNeedle
10 Images, 4*4 Stitching, Exact Accuracy
0.4
best: 26.9 (GPT-4o)
Long-Context UnderstandingonMMNeedle
10 Images, 8*8 Stitching, Exact Accuracy
0
best: 1 (GPT-4o)