Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: ROUGE-L (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
ROUGE-L (best first)
ROUGE-L (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
ROUGE-L
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
61.35
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT, Single Model
60.96
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
GIT2, Single Model
60.91
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
60.57
No
-
-
-
5
Microsoft Cognitive Services team
57.57
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
FudanFVL
57.29
No
-
-
-
7
Single Model
56.69
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
FudanWYZ
56.41
No
-
-
-
9
firethehole
55.08
No
-
-
-
10
IEDA-LAB
55
No
-
-
-
11
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
52.86
No
-
-
-
12
MD
52.54
No
-
-
-
13
vll@mk514
52.51
No
-
-
-
14
VinVL (Microsoft Cognitive Services + MSR)
51.99
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
15
icgp2ssi1_coco_si_0.02_5_test
51.75
No
-
-
-
16
evertyhing
51.54
No
-
-
-
17
Human
51.5
No
-
-
-
18
Oscar
50
No
-
-
-
19
vinvl_yuan_cbs
49.5
No
-
-
-
20
camel XE
48.85
No
-
-
-
21
RCAL
48.81
No
-
-
-
22
UpDown-C
48.6
No
-
-
-
23
cxy_nocaps_training
47.53
No
-
-
-
24
Xinyi
47.23
No
-
-
-
25
UpDown + ELMo + CBS
47.13
No
-
-
-
26
7_10-7_40000_predict_test.json
45.72
No
-
-
-
27
nocaps_training
44.84
No
-
-
-
28
UpDown
44.84
No
-
-
-
29
Neural Baby Talk + CBS
44.47
No
-
-
-
30
B2
44.37
No
-
-
-
31
YX
44.23
No
-
-
-
32
Neural Baby Talk
44.11
No
-
-
-
33
area_attention
43.59
No
-
-
-
34
CS395T
43.02
No
-
-
-
35
Yu-Wu
42.46
No
-
-
-
36
coco_all_19
41.58
No
-
-
-
37
Check
31.57
No
-
-
-
#1
PaLI
SOTA
61.35
ROUGE-L
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT, Single Model
SOTA
60.96
ROUGE-L
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
GIT2, Single Model
60.91
ROUGE-L
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
60.57
ROUGE-L
No paper
#5
Microsoft Cognitive Services team
SOTA
57.57
ROUGE-L
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
FudanFVL
57.29
ROUGE-L
No paper
#7
Single Model
56.69
ROUGE-L
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
FudanWYZ
56.41
ROUGE-L
No paper
#9
firethehole
55.08
ROUGE-L
No paper
#10
IEDA-LAB
55
ROUGE-L
No paper
#11
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
52.86
ROUGE-L
No paper
#12
MD
52.54
ROUGE-L
No paper
#13
vll@mk514
52.51
ROUGE-L
No paper
#14
VinVL (Microsoft Cognitive Services + MSR)
51.99
ROUGE-L
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#15
icgp2ssi1_coco_si_0.02_5_test
51.75
ROUGE-L
No paper
#16
evertyhing
51.54
ROUGE-L
No paper
#17
Human
51.5
ROUGE-L
No paper
#18
Oscar
50
ROUGE-L
No paper
#19
vinvl_yuan_cbs
49.5
ROUGE-L
No paper
#20
camel XE
48.85
ROUGE-L
No paper
#21
RCAL
48.81
ROUGE-L
No paper
#22
UpDown-C
48.6
ROUGE-L
No paper
#23
cxy_nocaps_training
47.53
ROUGE-L
No paper
#24
Xinyi
47.23
ROUGE-L
No paper
#25
UpDown + ELMo + CBS
47.13
ROUGE-L
No paper
#26
7_10-7_40000_predict_test.json
45.72
ROUGE-L
No paper
#27
nocaps_training
44.84
ROUGE-L
No paper
#28
UpDown
44.84
ROUGE-L
No paper
#29
Neural Baby Talk + CBS
44.47
ROUGE-L
No paper
#30
B2
44.37
ROUGE-L
No paper
#31
YX
44.23
ROUGE-L
No paper
#32
Neural Baby Talk
44.11
ROUGE-L
No paper
#33
area_attention
43.59
ROUGE-L
No paper
#34
CS395T
43.02
ROUGE-L
No paper
#35
Yu-Wu
42.46
ROUGE-L
No paper
#36
coco_all_19
41.58
ROUGE-L
No paper
#37
Check
31.57
ROUGE-L
No paper