Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: B1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B1 (best first)
B1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B1
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
86.28
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT2, Single Model
86.28
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
GIT, Single Model
85.99
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
84.75
No
-
-
-
5
Microsoft Cognitive Services team
81.73
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
FudanFVL
81.44
No
-
-
-
7
Single Model
80.89
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
FudanWYZ
80
No
-
-
-
9
IEDA-LAB
79.52
No
-
-
-
10
MD
76.81
No
-
-
-
11
firethehole
76.65
No
-
-
-
12
vll@mk514
76.41
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
76.2
No
-
-
-
14
VinVL (Microsoft Cognitive Services + MSR)
75.78
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
15
icgp2ssi1_coco_si_0.02_5_test
75.71
No
-
-
-
16
evertyhing
75.5
No
-
-
-
17
Oscar
74.98
No
-
-
-
18
Human
74.84
No
-
-
-
19
vinvl_yuan_cbs
73.95
No
-
-
-
20
cxy_nocaps_training
73.07
No
-
-
-
21
UpDown-C
72.94
No
-
-
-
22
Xinyi
72.53
No
-
-
-
23
RCAL
72.47
No
-
-
-
24
UpDown + ELMo + CBS
71.57
No
-
-
-
25
camel XE
71.34
No
-
-
-
26
nocaps_training
66.54
No
-
-
-
27
UpDown
66.54
No
-
-
-
28
YX
66.44
No
-
-
-
29
B2
66.32
No
-
-
-
30
7_10-7_40000_predict_test.json
66.14
No
-
-
-
31
Neural Baby Talk + CBS
65.98
No
-
-
-
32
area_attention
64.58
No
-
-
-
33
Neural Baby Talk
64.45
No
-
-
-
34
CS395T
63
No
-
-
-
35
coco_all_19
61.62
No
-
-
-
36
Yu-Wu
60.95
No
-
-
-
37
Check
47.08
No
-
-
-
#1
PaLI
86.28
B1
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT2, Single Model
SOTA
86.28
B1
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
GIT, Single Model
85.99
B1
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
84.75
B1
No paper
#5
Microsoft Cognitive Services team
SOTA
81.73
B1
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
FudanFVL
81.44
B1
No paper
#7
Single Model
80.89
B1
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
FudanWYZ
80
B1
No paper
#9
IEDA-LAB
79.52
B1
No paper
#10
MD
76.81
B1
No paper
#11
firethehole
76.65
B1
No paper
#12
vll@mk514
76.41
B1
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
76.2
B1
No paper
#14
VinVL (Microsoft Cognitive Services + MSR)
75.78
B1
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#15
icgp2ssi1_coco_si_0.02_5_test
75.71
B1
No paper
#16
evertyhing
75.5
B1
No paper
#17
Oscar
74.98
B1
No paper
#18
Human
74.84
B1
No paper
#19
vinvl_yuan_cbs
73.95
B1
No paper
#20
cxy_nocaps_training
73.07
B1
No paper
#21
UpDown-C
72.94
B1
No paper
#22
Xinyi
72.53
B1
No paper
#23
RCAL
72.47
B1
No paper
#24
UpDown + ELMo + CBS
71.57
B1
No paper
#25
camel XE
71.34
B1
No paper
#26
nocaps_training
66.54
B1
No paper
#27
UpDown
66.54
B1
No paper
#28
YX
66.44
B1
No paper
#29
B2
66.32
B1
No paper
#30
7_10-7_40000_predict_test.json
66.14
B1
No paper
#31
Neural Baby Talk + CBS
65.98
B1
No paper
#32
area_attention
64.58
B1
No paper
#33
Neural Baby Talk
64.45
B1
No paper
#34
CS395T
63
B1
No paper
#35
coco_all_19
61.62
B1
No paper
#36
Yu-Wu
60.95
B1
No paper
#37
Check
47.08
B1
No paper