Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: B2 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B2 (best first)
B2 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B2
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
71.28
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
PaLI
71.19
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
3
GIT2, Single Model
71.15
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
70.24
No
-
-
-
5
Microsoft Cognitive Services team
65.48
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
FudanFVL
64.71
No
-
-
-
7
Single Model
64.21
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
FudanWYZ
62.7
No
-
-
-
9
IEDA-LAB
61.01
No
-
-
-
10
firethehole
60.06
No
-
-
-
11
MD
57.39
No
-
-
-
12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
57.25
No
-
-
-
13
vll@mk514
56.87
No
-
-
-
14
icgp2ssi1_coco_si_0.02_5_test
56.39
No
-
-
-
15
evertyhing
56.14
No
-
-
-
16
VinVL (Microsoft Cognitive Services + MSR)
56.1
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
17
Human
53.9
No
-
-
-
18
Oscar
53.26
No
-
-
-
19
vinvl_yuan_cbs
52.76
No
-
-
-
20
RCAL
52.01
No
-
-
-
21
UpDown-C
51.36
No
-
-
-
22
cxy_nocaps_training
50.81
No
-
-
-
23
camel XE
50.32
No
-
-
-
24
Xinyi
49.99
No
-
-
-
25
UpDown + ELMo + CBS
48.58
No
-
-
-
26
7_10-7_40000_predict_test.json
44.7
No
-
-
-
27
nocaps_training
44.28
No
-
-
-
28
UpDown
44.28
No
-
-
-
29
B2
44.27
No
-
-
-
30
Neural Baby Talk + CBS
43.2
No
-
-
-
31
Neural Baby Talk
42.8
No
-
-
-
32
YX
42.47
No
-
-
-
33
area_attention
41.56
No
-
-
-
34
CS395T
39.71
No
-
-
-
35
coco_all_19
38.55
No
-
-
-
36
Yu-Wu
38.3
No
-
-
-
37
Check
22.24
No
-
-
-
#1
GIT, Single Model
SOTA
71.28
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
PaLI
71.19
B2
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#3
GIT2, Single Model
71.15
B2
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
70.24
B2
No paper
#5
Microsoft Cognitive Services team
SOTA
65.48
B2
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
FudanFVL
64.71
B2
No paper
#7
Single Model
64.21
B2
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
FudanWYZ
62.7
B2
No paper
#9
IEDA-LAB
61.01
B2
No paper
#10
firethehole
60.06
B2
No paper
#11
MD
57.39
B2
No paper
#12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
57.25
B2
No paper
#13
vll@mk514
56.87
B2
No paper
#14
icgp2ssi1_coco_si_0.02_5_test
56.39
B2
No paper
#15
evertyhing
56.14
B2
No paper
#16
VinVL (Microsoft Cognitive Services + MSR)
56.1
B2
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#17
Human
53.9
B2
No paper
#18
Oscar
53.26
B2
No paper
#19
vinvl_yuan_cbs
52.76
B2
No paper
#20
RCAL
52.01
B2
No paper
#21
UpDown-C
51.36
B2
No paper
#22
cxy_nocaps_training
50.81
B2
No paper
#23
camel XE
50.32
B2
No paper
#24
Xinyi
49.99
B2
No paper
#25
UpDown + ELMo + CBS
48.58
B2
No paper
#26
7_10-7_40000_predict_test.json
44.7
B2
No paper
#27
nocaps_training
44.28
B2
No paper
#28
UpDown
44.28
B2
No paper
#29
B2
44.27
B2
No paper
#30
Neural Baby Talk + CBS
43.2
B2
No paper
#31
Neural Baby Talk
42.8
B2
No paper
#32
YX
42.47
B2
No paper
#33
area_attention
41.56
B2
No paper
#34
CS395T
39.71
B2
No paper
#35
coco_all_19
38.55
B2
No paper
#36
Yu-Wu
38.3
B2
No paper
#37
Check
22.24
B2
No paper