Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: B3 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B3 (best first)
B3 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B3
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
60.53
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
GIT2, Single Model
59.94
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
PaLI
59.38
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
4
CoCa - Google Brain
58.01
No
-
-
-
5
Microsoft Cognitive Services team
55.94
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
52.96
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
52.56
No
-
-
-
8
IEDA-LAB
51.89
No
-
-
-
9
vll@mk514
51.26
No
-
-
-
10
MD
51.16
No
-
-
-
11
FudanWYZ
50.75
No
-
-
-
12
firethehole
50.5
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
49.73
No
-
-
-
14
VinVL (Microsoft Cognitive Services + MSR)
49.68
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
15
camel XE
46.46
No
-
-
-
16
RCAL
45.33
No
-
-
-
17
icgp2ssi1_coco_si_0.02_5_test
44.65
No
-
-
-
18
evertyhing
43.92
No
-
-
-
19
cxy_nocaps_training
43.43
No
-
-
-
20
作者给的test文件
43.43
No
-
-
-
21
Xinyi
43.22
No
-
-
-
22
Oscar
42.86
No
-
-
-
23
MQ-UpDown-C
42.35
No
-
-
-
24
UpDown
41.5
No
-
-
-
25
nocaps_training
41.5
No
-
-
-
26
B2
40.54
No
-
-
-
27
UpDown + ELMo + CBS
39.86
No
-
-
-
28
YX
39.28
No
-
-
-
29
area_attention
38.44
No
-
-
-
30
7_10-7_40000_predict_test.json
37.85
No
-
-
-
31
Human
37.78
No
-
-
-
32
None
36.12
No
-
-
-
33
Neural Baby Talk
35.58
No
-
-
-
34
coco_all_19
34.13
No
-
-
-
35
Neural Baby Talk + CBS
33.73
No
-
-
-
36
Yu-Wu
31.92
No
-
-
-
37
CS395T
29.57
No
-
-
-
#1
GIT, Single Model
SOTA
60.53
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
GIT2, Single Model
59.94
B3
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
PaLI
59.38
B3
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#4
CoCa - Google Brain
58.01
B3
No paper
#5
Microsoft Cognitive Services team
SOTA
55.94
B3
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
52.96
B3
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
52.56
B3
No paper
#8
IEDA-LAB
51.89
B3
No paper
#9
vll@mk514
51.26
B3
No paper
#10
MD
51.16
B3
No paper
#11
FudanWYZ
50.75
B3
No paper
#12
firethehole
50.5
B3
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
49.73
B3
No paper
#14
VinVL (Microsoft Cognitive Services + MSR)
49.68
B3
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#15
camel XE
46.46
B3
No paper
#16
RCAL
45.33
B3
No paper
#17
icgp2ssi1_coco_si_0.02_5_test
44.65
B3
No paper
#18
evertyhing
43.92
B3
No paper
#19
cxy_nocaps_training
43.43
B3
No paper
#20
作者给的test文件
43.43
B3
No paper
#21
Xinyi
43.22
B3
No paper
#22
Oscar
42.86
B3
No paper
#23
MQ-UpDown-C
42.35
B3
No paper
#24
UpDown
41.5
B3
No paper
#25
nocaps_training
41.5
B3
No paper
#26
B2
40.54
B3
No paper
#27
UpDown + ELMo + CBS
39.86
B3
No paper
#28
YX
39.28
B3
No paper
#29
area_attention
38.44
B3
No paper
#30
7_10-7_40000_predict_test.json
37.85
B3
No paper
#31
Human
37.78
B3
No paper
#32
None
36.12
B3
No paper
#33
Neural Baby Talk
35.58
B3
No paper
#34
coco_all_19
34.13
B3
No paper
#35
Neural Baby Talk + CBS
33.73
B3
No paper
#36
Yu-Wu
31.92
B3
No paper
#37
CS395T
29.57
B3
No paper