Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GPT-4o

GPT-4o

Reported on 45 benchmarks across 11 tasks · 5 papers · 23 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing37 results

Visual Question Answering (VQA)onVLM2-Bench
Average Score on VLM2-bench (9 subtasks)· 2024-10-25
60.36
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
GC-mat· 2024-10-25
37.45
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
GC-trk· 2024-10-25
39.27
best: 43.38 (Qwen2.5-VL-7B)
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
OC-cnt· 2024-10-25
80.62
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
OC-cpr· 2024-10-25
74.17
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
OC-grp· 2024-10-25
57.5
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
PC-VID· 2024-10-25
66.75
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
PC-cnt· 2024-10-25
90.5
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)on6-DoF SpatialBench
Orientation-rel· 2024-10-25
44.2
best: 54.6 (SoFar)
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)on6-DoF SpatialBench
Total· 2024-10-25
36.2
best: 43.9 (SoFar)
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answeringon6-DoF SpatialBench
Orientation-rel· 2024-10-25
44.2
best: 54.6 (SoFar)
SOTA
GPT-4o System Card arXiv:2410.21276
Visual Question Answeringon6-DoF SpatialBench
Total· 2024-10-25
36.2
best: 43.9 (SoFar)
SOTA
GPT-4o System Card arXiv:2410.21276
Long-Context UnderstandingonMMNeedle
1 Image, 2*2 Stitching, Exact Accuracy· 2023-03-15
94.6
SOTA
GPT-4 Technical Report arXiv:2303.08774
Long-Context UnderstandingonMMNeedle
1 Image, 4*4 Stitching, Exact Accuracy· 2023-03-15
83
SOTA
GPT-4 Technical Report arXiv:2303.08774
Long-Context UnderstandingonMMNeedle
1 Image, 8*8 Stitching, Exact Accuracy· 2023-03-15
19
best: 29.81 (Gemini Pro 1.5)
SOTA
GPT-4 Technical Report arXiv:2303.08774
Long-Context UnderstandingonMMNeedle
10 Images, 1*1 Stitching, Exact Accuracy· 2023-03-15
97
SOTA
GPT-4 Technical Report arXiv:2303.08774
Long-Context UnderstandingonMMNeedle
10 Images, 2*2 Stitching, Exact Accuracy· 2023-03-15
81.8
SOTA
GPT-4 Technical Report arXiv:2303.08774
Long-Context UnderstandingonMMNeedle
10 Images, 4*4 Stitching, Exact Accuracy· 2023-03-15
26.9
SOTA
GPT-4 Technical Report arXiv:2303.08774
Long-Context UnderstandingonMMNeedle
10 Images, 8*8 Stitching, Exact Accuracy· 2023-03-15
1
SOTA
GPT-4 Technical Report arXiv:2303.08774
Description-guided molecule generationonTOMG-Bench
wAcc· 2024-12-19
32.29
best: 35.92 (Claude-3.5)
TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation arXiv:2412.14642
Visual Question Answering (VQA)onVLM2-Bench
PC-cpr· 2024-10-25
50
best: 80 (Qwen2.5-VL-7B)
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)onVLM2-Bench
PC-grp· 2024-10-25
47
best: 69 (Qwen2.5-VL-7B)
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)on6-DoF SpatialBench
Orientation-abs· 2024-10-25
25.8
best: 31.3 (SoFar)
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)on6-DoF SpatialBench
Position-abs· 2024-10-25
28.4
best: 33.8 (SoFar)
GPT-4o System Card arXiv:2410.21276
Visual Question Answering (VQA)on6-DoF SpatialBench
Position-rel· 2024-10-25
49.4
best: 59.6 (SoFar)
GPT-4o System Card arXiv:2410.21276
Visual Question Answeringon6-DoF SpatialBench
Orientation-abs· 2024-10-25
25.8
best: 31.3 (SoFar)
GPT-4o System Card arXiv:2410.21276
Visual Question Answeringon6-DoF SpatialBench
Position-abs· 2024-10-25
28.4
best: 33.8 (SoFar)
GPT-4o System Card arXiv:2410.21276
Visual Question Answeringon6-DoF SpatialBench
Position-rel· 2024-10-25
49.4
best: 59.6 (SoFar)
GPT-4o System Card arXiv:2410.21276
Question AnsweringonVideo-MME (w/o subs)
Accuracy (%)· 2024-06-14
70.3
best: 77.4 (Video-RAG (based on LLaVA-Video))
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781
Question AnsweringonZero-shot Video Question Answering on LongVideoBench
Accuracy (% )· uses extra data· 2024-06-14
64
best: 66.7 (Gemini 1.5 Pro)
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781
Question AnsweringonVideo-MME
Accuracy (%)· 2024-06-14
77.2
best: 81.3 (Gemini 1.5 Pro)
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781
Relation ExtractiononVinoground
Group Score
24.6
best: 35 (GPT-4o (CoT))
Relation ExtractiononVinoground
Text Score
54
best: 59.2 (GPT-4o (CoT))
Relation ExtractiononVinoground
Video Score
38.2
best: 51 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Group Score
24.6
best: 35 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Text Score
54
best: 59.2 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Video Score
38.2
best: 51 (GPT-4o (CoT))

Methodology3 results

Optical Character Recognition (OCR)onVideoDB's OCR Benchmark Public Collection
Average Accuracy· 2025-02-10
76.22
SOTA
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments arXiv:2502.06445
Optical Character Recognition (OCR)onVideoDB's OCR Benchmark Public Collection
Character Error Rate (CER)· 2025-02-10
0.2378
SOTA
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments arXiv:2502.06445
Optical Character Recognition (OCR)onVideoDB's OCR Benchmark Public Collection
Word Error Rate (WER)· 2025-02-10
0.5117
best: 0.2385 (Gemini-1.5 Pro)
SOTA
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments arXiv:2502.06445

Reasoning3 results

Video Question AnsweringonVideo-MME (w/o subs)
Accuracy (%)· 2024-06-14
70.3
best: 77.4 (Video-RAG (based on LLaVA-Video))
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781
Video Question AnsweringonZero-shot Video Question Answering on LongVideoBench
Accuracy (% )· uses extra data· 2024-06-14
64
best: 66.7 (Gemini 1.5 Pro)
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781
Video Question AnsweringonVideo-MME
Accuracy (%)· 2024-06-14
77.2
best: 81.3 (Gemini 1.5 Pro)
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781

Computer Vision1 result

MMR totalonMRR-Benchmark
Total Column Score· uses extra data· 2024-06-14
457
best: 463 (Claude 3.5 Sonnet)
SOTA
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding arXiv:2406.09781

Knowledge Base1 result

Mathematical ReasoningonFrontierMath
Accuracy
0.01
best: 0.252 (o3)