Claude Instant 1.1 (few-shot, k=5)

Reported on 2 benchmarks across 2 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Question AnsweringonTriviaQA
EM
78.9
best: 87.5 (Claude 2 (few-shot, k=5))
Common Sense ReasoningonARC (Challenge)
Accuracy
85.7
best: 96.4 (GPT-4 (few-shot, k=25))