Anthropic/claude-3-7-sonnet

Reported on 2 benchmarks across 1 task

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Question AnsweringonNewsQA
EM
74.23
best: 92.52 (OpenAI/o3-2025-01-31-high)
Question AnsweringonNewsQA
F1
82.3
best: 94.01 (Riple/Saanvi-v0.5-DeepAnalysis)