Claude 3 Opus
Reported on 8 benchmarks across 2 tasks
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing8 results
- Accuracy86.4best: 96.6 (EG-CFG (DeepSeek-V3-0324))
- 1 Image, 2*2 Stitching, Exact Accuracy52.25best: 94.6 (GPT-4o)
- 1 Image, 4*4 Stitching, Exact Accuracy12.3best: 83 (GPT-4o)
- 1 Image, 8*8 Stitching, Exact Accuracy1.6best: 29.81 (Gemini Pro 1.5)
- 10 Images, 1*1 Stitching, Exact Accuracy66.93best: 97 (GPT-4o)
- 10 Images, 2*2 Stitching, Exact Accuracy4.6best: 81.8 (GPT-4o)
- 10 Images, 4*4 Stitching, Exact Accuracy0.4best: 26.9 (GPT-4o)
- 10 Images, 8*8 Stitching, Exact Accuracy0best: 1 (GPT-4o)