Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps near-domain
Image Captioning on nocaps near-domain
Metric: B2 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B2 (best first)
B2 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B2
▼
Extra Data
Paper
Date
↕
Code
1
GIT2, Single Model
75.86
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
PaLI
75.56
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
3
GIT, Single Model
75.48
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
74.49
No
-
-
-
5
Microsoft Cognitive Services team
72.6
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
69.83
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
69.66
No
-
-
-
8
IEDA-LAB
68.58
No
-
-
-
9
FudanWYZ
68.56
No
-
-
-
10
MD
67.99
No
-
-
-
11
VinVL (Microsoft Cognitive Services + MSR)
66.94
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
12
firethehole
66.65
No
-
-
-
13
vll@mk514
66.55
No
-
-
-
14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
65.88
No
-
-
-
15
icgp2ssi1_coco_si_0.02_5_test
63.01
No
-
-
-
16
evertyhing
62.73
No
-
-
-
17
Oscar
62.32
No
-
-
-
18
vinvl_yuan_cbs
62.31
No
-
-
-
19
RCAL
62.26
No
-
-
-
20
camel XE
62.06
No
-
-
-
21
cxy_nocaps_training
60.75
No
-
-
-
22
Xinyi
60.52
No
-
-
-
23
MQ-UpDown-C
59
No
-
-
-
24
UpDown + ELMo + CBS
58.31
No
-
-
-
25
Human
56.97
No
-
-
-
26
nocaps_training
56.93
No
-
-
-
27
UpDown
56.93
No
-
-
-
28
B2
55.53
No
-
-
-
29
7_10-7_40000_predict_test.json
54.26
No
-
-
-
30
Neural Baby Talk
54.1
No
-
-
-
31
YX
53.98
No
-
-
-
32
None
53.74
No
-
-
-
33
Neural Baby Talk + CBS
53.67
No
-
-
-
34
area_attention
53.56
No
-
-
-
35
coco_all_19
50.79
No
-
-
-
36
CS395T
48.92
No
-
-
-
37
Yu-Wu
48.7
No
-
-
-
#1
GIT2, Single Model
SOTA
75.86
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
PaLI
75.56
B2
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#3
GIT, Single Model
75.48
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
74.49
B2
No paper
#5
Microsoft Cognitive Services team
SOTA
72.6
B2
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
69.83
B2
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
69.66
B2
No paper
#8
IEDA-LAB
68.58
B2
No paper
#9
FudanWYZ
68.56
B2
No paper
#10
MD
67.99
B2
No paper
#11
VinVL (Microsoft Cognitive Services + MSR)
66.94
B2
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#12
firethehole
66.65
B2
No paper
#13
vll@mk514
66.55
B2
No paper
#14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
65.88
B2
No paper
#15
icgp2ssi1_coco_si_0.02_5_test
63.01
B2
No paper
#16
evertyhing
62.73
B2
No paper
#17
Oscar
62.32
B2
No paper
#18
vinvl_yuan_cbs
62.31
B2
No paper
#19
RCAL
62.26
B2
No paper
#20
camel XE
62.06
B2
No paper
#21
cxy_nocaps_training
60.75
B2
No paper
#22
Xinyi
60.52
B2
No paper
#23
MQ-UpDown-C
59
B2
No paper
#24
UpDown + ELMo + CBS
58.31
B2
No paper
#25
Human
56.97
B2
No paper
#26
nocaps_training
56.93
B2
No paper
#27
UpDown
56.93
B2
No paper
#28
B2
55.53
B2
No paper
#29
7_10-7_40000_predict_test.json
54.26
B2
No paper
#30
Neural Baby Talk
54.1
B2
No paper
#31
YX
53.98
B2
No paper
#32
None
53.74
B2
No paper
#33
Neural Baby Talk + CBS
53.67
B2
No paper
#34
area_attention
53.56
B2
No paper
#35
coco_all_19
50.79
B2
No paper
#36
CS395T
48.92
B2
No paper
#37
Yu-Wu
48.7
B2
No paper