Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: ROUGE-L (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
ROUGE-L (best first)
ROUGE-L (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
ROUGE-L
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
64.39
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT, Single Model
64.02
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
GIT2, Single Model
63.82
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
63.12
No
-
-
-
5
Microsoft Cognitive Services team
62.48
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
61.01
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
60.52
No
-
-
-
8
IEDA-LAB
60.07
No
-
-
-
9
vll@mk514
59.75
No
-
-
-
10
FudanWYZ
59.67
No
-
-
-
11
MD
59.67
No
-
-
-
12
firethehole
59.54
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
58.62
No
-
-
-
14
VinVL (Microsoft Cognitive Services + MSR)
58.54
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
15
camel XE
56.84
No
-
-
-
16
RCAL
56.76
No
-
-
-
17
icgp2ssi1_coco_si_0.02_5_test
56.4
No
-
-
-
18
Oscar
55.91
No
-
-
-
19
evertyhing
55.88
No
-
-
-
20
MQ-UpDown-C
55.25
No
-
-
-
21
cxy_nocaps_training
55.06
No
-
-
-
22
作者给的test文件
55.06
No
-
-
-
23
Xinyi
55.03
No
-
-
-
24
UpDown
54.42
No
-
-
-
25
nocaps_training
54.42
No
-
-
-
26
UpDown + ELMo + CBS
53.98
No
-
-
-
27
B2
53.49
No
-
-
-
28
Human
53.47
No
-
-
-
29
YX
53.22
No
-
-
-
30
area_attention
52.53
No
-
-
-
31
7_10-7_40000_predict_test.json
52.44
No
-
-
-
32
None
52.26
No
-
-
-
33
Neural Baby Talk
51.42
No
-
-
-
34
Neural Baby Talk + CBS
50.84
No
-
-
-
35
coco_all_19
50.53
No
-
-
-
36
Yu-Wu
49.64
No
-
-
-
37
CS395T
49.05
No
-
-
-
#1
PaLI
SOTA
64.39
ROUGE-L
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT, Single Model
SOTA
64.02
ROUGE-L
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
GIT2, Single Model
63.82
ROUGE-L
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
63.12
ROUGE-L
No paper
#5
Microsoft Cognitive Services team
SOTA
62.48
ROUGE-L
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
61.01
ROUGE-L
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
60.52
ROUGE-L
No paper
#8
IEDA-LAB
60.07
ROUGE-L
No paper
#9
vll@mk514
59.75
ROUGE-L
No paper
#10
FudanWYZ
59.67
ROUGE-L
No paper
#11
MD
59.67
ROUGE-L
No paper
#12
firethehole
59.54
ROUGE-L
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
58.62
ROUGE-L
No paper
#14
VinVL (Microsoft Cognitive Services + MSR)
58.54
ROUGE-L
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#15
camel XE
56.84
ROUGE-L
No paper
#16
RCAL
56.76
ROUGE-L
No paper
#17
icgp2ssi1_coco_si_0.02_5_test
56.4
ROUGE-L
No paper
#18
Oscar
55.91
ROUGE-L
No paper
#19
evertyhing
55.88
ROUGE-L
No paper
#20
MQ-UpDown-C
55.25
ROUGE-L
No paper
#21
cxy_nocaps_training
55.06
ROUGE-L
No paper
#22
作者给的test文件
55.06
ROUGE-L
No paper
#23
Xinyi
55.03
ROUGE-L
No paper
#24
UpDown
54.42
ROUGE-L
No paper
#25
nocaps_training
54.42
ROUGE-L
No paper
#26
UpDown + ELMo + CBS
53.98
ROUGE-L
No paper
#27
B2
53.49
ROUGE-L
No paper
#28
Human
53.47
ROUGE-L
No paper
#29
YX
53.22
ROUGE-L
No paper
#30
area_attention
52.53
ROUGE-L
No paper
#31
7_10-7_40000_predict_test.json
52.44
ROUGE-L
No paper
#32
None
52.26
ROUGE-L
No paper
#33
Neural Baby Talk
51.42
ROUGE-L
No paper
#34
Neural Baby Talk + CBS
50.84
ROUGE-L
No paper
#35
coco_all_19
50.53
ROUGE-L
No paper
#36
Yu-Wu
49.64
ROUGE-L
No paper
#37
CS395T
49.05
ROUGE-L
No paper