Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: B1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B1 (best first)
B1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B1
▼
Extra Data
Paper
Date
↕
Code
1
GIT2, Single Model
88.86
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
GIT, Single Model
88.55
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
PaLI
88.02
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
4
CoCa - Google Brain
87.27
No
-
-
-
5
Microsoft Cognitive Services team
86.33
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
84.64
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
IEDA-LAB
84.4
No
-
-
-
8
FudanFVL
84.2
No
-
-
-
9
MD
84.03
No
-
-
-
10
vll@mk514
83.77
No
-
-
-
11
VinVL (Microsoft Cognitive Services + MSR)
83.24
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
12
FudanWYZ
82.91
No
-
-
-
13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
82.9
No
-
-
-
14
firethehole
81.86
No
-
-
-
15
cxy_nocaps_training
81.64
No
-
-
-
16
作者给的test文件
81.64
No
-
-
-
17
Xinyi
81.61
No
-
-
-
18
Oscar
80.7
No
-
-
-
19
RCAL
80.68
No
-
-
-
20
camel XE
80.5
No
-
-
-
21
icgp2ssi1_coco_si_0.02_5_test
80.26
No
-
-
-
22
evertyhing
79.58
No
-
-
-
23
MQ-UpDown-C
78.73
No
-
-
-
24
UpDown
77.68
No
-
-
-
25
nocaps_training
77.68
No
-
-
-
26
UpDown + ELMo + CBS
77.65
No
-
-
-
27
B2
77.06
No
-
-
-
28
Human
76.89
No
-
-
-
29
Neural Baby Talk + CBS
76.49
No
-
-
-
30
YX
76.48
No
-
-
-
31
area_attention
76.12
No
-
-
-
32
Neural Baby Talk
75.91
No
-
-
-
33
7_10-7_40000_predict_test.json
75.31
No
-
-
-
34
None
74.35
No
-
-
-
35
coco_all_19
72.76
No
-
-
-
36
CS395T
72.24
No
-
-
-
37
Yu-Wu
72.05
No
-
-
-
#1
GIT2, Single Model
SOTA
88.86
B1
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
GIT, Single Model
88.55
B1
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
PaLI
88.02
B1
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#4
CoCa - Google Brain
87.27
B1
No paper
#5
Microsoft Cognitive Services team
SOTA
86.33
B1
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
84.64
B1
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
IEDA-LAB
84.4
B1
No paper
#8
FudanFVL
84.2
B1
No paper
#9
MD
84.03
B1
No paper
#10
vll@mk514
83.77
B1
No paper
#11
VinVL (Microsoft Cognitive Services + MSR)
83.24
B1
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#12
FudanWYZ
82.91
B1
No paper
#13
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
82.9
B1
No paper
#14
firethehole
81.86
B1
No paper
#15
cxy_nocaps_training
81.64
B1
No paper
#16
作者给的test文件
81.64
B1
No paper
#17
Xinyi
81.61
B1
No paper
#18
Oscar
80.7
B1
No paper
#19
RCAL
80.68
B1
No paper
#20
camel XE
80.5
B1
No paper
#21
icgp2ssi1_coco_si_0.02_5_test
80.26
B1
No paper
#22
evertyhing
79.58
B1
No paper
#23
MQ-UpDown-C
78.73
B1
No paper
#24
UpDown
77.68
B1
No paper
#25
nocaps_training
77.68
B1
No paper
#26
UpDown + ELMo + CBS
77.65
B1
No paper
#27
B2
77.06
B1
No paper
#28
Human
76.89
B1
No paper
#29
Neural Baby Talk + CBS
76.49
B1
No paper
#30
YX
76.48
B1
No paper
#31
area_attention
76.12
B1
No paper
#32
Neural Baby Talk
75.91
B1
No paper
#33
7_10-7_40000_predict_test.json
75.31
B1
No paper
#34
None
74.35
B1
No paper
#35
coco_all_19
72.76
B1
No paper
#36
CS395T
72.24
B1
No paper
#37
Yu-Wu
72.05
B1
No paper