gpt-4o-2024-08-06

Reported on 6 benchmarks across 3 tasks · 3 papers · 1 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

Code GenerationonWebApp1K-React
pass@1· 2024-09-08
0.885
best: 0.952 (o1-preview)
SOTA
Insights from Benchmarking Frontier Language Models on Web App Code Generation arXiv:2409.05177
Code GenerationonWebApp1k-Duo-React
pass@1· 2024-09-19
0.531
best: 0.679 (claude-3-5-sonnet)
A Case Study of Web App Coding with OpenAI Reasoning Models arXiv:2409.13773
Entity ResolutiononAbt-Buy
F1 (%)· 2024-09-12
92.2
best: 95.78 (gpt4-0613_zeroshot)
Fine-tuning Large Language Models for Entity Matching arXiv:2409.08185
Entity ResolutiononAmazon-Google
F1 (%)· 2024-09-12
63.45
best: 85.21 (gpt4-0613_fewshot-10)
Fine-tuning Large Language Models for Entity Matching arXiv:2409.08185

Knowledge Base2 results

Data IntegrationonAbt-Buy
F1 (%)· 2024-09-12
92.2
best: 95.78 (gpt4-0613_zeroshot)
Fine-tuning Large Language Models for Entity Matching arXiv:2409.08185
Data IntegrationonAmazon-Google
F1 (%)· 2024-09-12
63.45
best: 85.21 (gpt4-0613_fewshot-10)
Fine-tuning Large Language Models for Entity Matching arXiv:2409.08185